[jira] [Created] (ZEPPELIN-3522) String "defaultValue" (instead of boolean) in some "interpreter-settings.json" files

2018-06-01 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-3522:
-

 Summary: String "defaultValue" (instead of boolean) in some 
"interpreter-settings.json" files
 Key: ZEPPELIN-3522
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3522
 Project: Zeppelin
  Issue Type: Bug
  Components: conf, Interpreters
Affects Versions: 0.7.3, 0.8.0
Reporter: Sanjay Dasgupta
 Fix For: 0.8.0, 0.7.4


The _interpreter-settings.json_ file for each interpreter has details of each 
configurable parameter for that interpreter. Each parameter also has a 
_defaultValue_ setting. For boolean-typed parameters the _defaultValue_ must be 
set to _true_ or _false_.

 

But in some of these _interpreter-settings.json_ files, the _defaultValue_ has 
been set to the string values _"true"_ and _"false"_ (the quote marks are 
included in the value provided). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3493) "Export all data as csv" not exporting all data

2018-05-23 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-3493:
-

 Summary: "Export all data as csv" not exporting all data
 Key: ZEPPELIN-3493
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3493
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end, Interpreters
Affects Versions: 0.8.0
Reporter: Sanjay Dasgupta
 Fix For: 0.8.0


The "Export all data as csv" menu item (top right of grid UI) appears to export 
the same number of records as the "Export visible data as csv".

When tested using z.show(...) to display a dataframe containing more than 
zeppelin.spark.maxResult records, the output exported by both these commands 
was limited by zeppelin.spark.maxResult, and contained exactly the same number 
of records.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3459) Passing Z variables to Markdown interpreter

2018-05-14 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-3459:
-

 Summary: Passing Z variables to Markdown interpreter
 Key: ZEPPELIN-3459
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3459
 Project: Zeppelin
  Issue Type: New Feature
  Components: conf, documentation, Interpreters
Affects Versions: 0.7.3, 0.8.0
Reporter: Sanjay Dasgupta
Assignee: Sanjay Dasgupta
 Fix For: 0.7.4, 0.9.0, 0.8.1


This issue documents the interpolation of ZeppelinContext objects into the 
paragraph text of BigQuery cells. It is a child of the umbrella issue 
ZEPPELIN-3342 (Passing Z variables to ALL interpreters) and a grandchild of the 
issue ZEPPELIN-1967. 

The implementation will take the same approach that was followed in 
[PR-2898|https://github.com/apache/zeppelin/pull/2898] and 
[PR-2903|https://github.com/apache/zeppelin/pull/2903].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3438) Passing Z variables to BigQuery interpreter

2018-05-01 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-3438:
-

 Summary: Passing Z variables to BigQuery interpreter
 Key: ZEPPELIN-3438
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3438
 Project: Zeppelin
  Issue Type: New Feature
  Components: conf, documentation, Interpreters
Affects Versions: 0.8.0, 0.7.4, 0.9.0
Reporter: Sanjay Dasgupta


This issue documents the interpolation of ZeppelinContext objects into the 
paragraph text of BigQuery cells. It is a child of the umbrella issue 
ZEPPELIN-3342 (Passing Z variables to ALL interpreters) and a grandchild of the 
issue ZEPPELIN-1967. 

The implementation will take the same approach that was followed in 
[PR-2898|https://github.com/apache/zeppelin/pull/2898] and 
[PR-2903|https://github.com/apache/zeppelin/pull/2903].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3388) Refactor documentation for ZeppelinContext

2018-04-05 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-3388:
-

 Summary: Refactor documentation for ZeppelinContext
 Key: ZEPPELIN-3388
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3388
 Project: Zeppelin
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.8.0, 0.7.4, 0.9.0
Reporter: Sanjay Dasgupta


The description of ZepplinContext is now almost entirely contained within the 
Spark interpreter's documentation 
([spark.md|https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md]).
 But ZepplinContext has many generic features that are available to all 
interpreters, and it is important for ZepplinContext to have a more visible and 
independent presence in the Zeppelin documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3383) ZeppelinContext Get-Form-Input-Data method

2018-04-03 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-3383:
-

 Summary: ZeppelinContext Get-Form-Input-Data method
 Key: ZEPPELIN-3383
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3383
 Project: Zeppelin
  Issue Type: New Feature
  Components: documentation, GUI
Reporter: Sanjay Dasgupta


There have been requests for a method to enable programmatic access to form 
input data (e.g. 
[Zeppelin-425|https://issues.apache.org/jira/browse/ZEPPELIN-425]). It is 
proposed to augment ZeppelinContext with the following method to allow such 
access:

{{z.getFormInput("var-name")}}

The availability of _getFormInput()_ will also allow form inputs to be used 
globally across the notebook (also often requested, e.g. 
[Zeppelin-1680|https://issues.apache.org/jira/browse/ZEPPELIN-1680]) by 
creating a Z variable with the same name:

{{z.put("var-name", z.getFormInput("var-name"))}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3377) Passing Z variables to JDBC interpreter

2018-04-01 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-3377:
-

 Summary: Passing Z variables to JDBC interpreter
 Key: ZEPPELIN-3377
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3377
 Project: Zeppelin
  Issue Type: New Feature
  Components: conf, documentation, Interpreters
Affects Versions: 0.8.0, 0.7.4, 0.9.0
Reporter: Sanjay Dasgupta


This issue documents the interpolation of ZeppelinContext objects into the 
paragraph text of JDBC cells. It is a child of the umbrella issue ZEPPELIN-3342 
(Passing Z variables to ALL interpreters) and a grandchild of the issue 
ZEPPELIN-1967.

The implementation will take the same approach that was followed in 
[PR-2898|https://github.com/apache/zeppelin/pull/2898].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3342) Passing Z variables to ALL interpreters

2018-03-15 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-3342:
-

 Summary: Passing Z variables to ALL interpreters
 Key: ZEPPELIN-3342
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3342
 Project: Zeppelin
  Issue Type: New Feature
  Components: Interpreters
Affects Versions: 0.8.0, 0.9.0
Reporter: Sanjay Dasgupta


This is a follow on issue to ZEPPELIN-1967 (Passing Z variables to Shell and 
SQL Interpreters). It envisages the extension of the functionality in 
ZEPPELIN-1967 to all Zeppelin interpreters.

 An examination of the source code of the Zeppelin interpreters shows that the 
same functionality can be extended to all interpreters (with a few exceptions) 
by making simple changes in just one or two lines in/around the _interpret_ 
method of each _Interpreter_ sub-class. The implementation approach can be seen 
in the [PR-2834|https://github.com/apache/zeppelin/pull/2834] associated with 
ZEPPELIN-1967.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-2849) Passing Z variables to SHELL Interpreter (One part of ZEPPELIN-1967)

2017-08-13 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-2849:
-

 Summary: Passing Z variables to SHELL Interpreter (One part of 
ZEPPELIN-1967)
 Key: ZEPPELIN-2849
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2849
 Project: Zeppelin
  Issue Type: New Feature
  Components: Interpreters
Affects Versions: 0.7.0, 0.8.0
Reporter: Sanjay Dasgupta


The issue https://issues.apache.org/jira/browse/ZEPPELIN-1967 requests 
implementation of the same functionality in different interpreters (and in 
different interpreter groups). But it may be simpler to implement the function 
separately in each interpreter of each group.

This issue has been created to accompany an implementation for the SHELL 
interpreter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2807) Passing Z variables to SQL Interpreter (One part of ZEPPLIN-1967)

2017-07-24 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created ZEPPELIN-2807:
-

 Summary: Passing Z variables to SQL Interpreter (One part of 
ZEPPLIN-1967)
 Key: ZEPPELIN-2807
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2807
 Project: Zeppelin
  Issue Type: New Feature
  Components: Interpreters
Affects Versions: 0.7.0, 0.8.0
Reporter: Sanjay Dasgupta


The issue https://issues.apache.org/jira/browse/ZEPPELIN-1967 requests 
implementation of the same functionality in different interpreters (and in 
different interpreter groups). But it may be simpler to implement the function 
separately in each interpreter of each group.

This issue has been created to accompany an implementation for the Spark SQL 
interpreter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SPARK-19034) Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2

2016-12-30 Thread Sanjay Dasgupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787426#comment-15787426
 ] 

Sanjay Dasgupta commented on SPARK-19034:
-

Yes, the SPARK_HOME was the issue.

Apologies for the confusion.

> Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2
> 
>
> Key: SPARK-19034
> URL: https://issues.apache.org/jira/browse/SPARK-19034
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.1.0
> Environment: All
>Reporter: Sanjay Dasgupta
>  Labels: distribution, download
>
> Download packages on 'https://spark.apache.org/downloads.html' have the right 
> name ( spark-2.1.0-bin-...) but contain the release 2.0.2 software



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19034) Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2

2016-12-30 Thread Sanjay Dasgupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787390#comment-15787390
 ] 

Sanjay Dasgupta commented on SPARK-19034:
-

The "Direct download" link to the "Pre-built for Hadoop 2.4" package is the 
following:

http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.4.tgz

When I run the "spark-shell" from this package it clearly announces itself as 
"version 2.0.2". Running "spark.version" in the REPL also produces "res0: 
String = 2.0.2"

> Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2
> 
>
> Key: SPARK-19034
> URL: https://issues.apache.org/jira/browse/SPARK-19034
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.1.0
> Environment: All
>Reporter: Sanjay Dasgupta
>  Labels: distribution, download
>
> Download packages on 'https://spark.apache.org/downloads.html' have the right 
> name ( spark-2.1.0-bin-...) but contain the release 2.0.2 software



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19034) Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2

2016-12-29 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created SPARK-19034:
---

 Summary: Download packages on 'spark.apache.org/downloads.html' 
contain release 2.0.2
 Key: SPARK-19034
 URL: https://issues.apache.org/jira/browse/SPARK-19034
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.1.0
 Environment: All
Reporter: Sanjay Dasgupta


Download packages on 'https://spark.apache.org/downloads.html' have the right 
name ( spark-2.1.0-bin-...) but contain the release 2.0.2 software



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16347) DataFrame allows duplicate column-names

2016-07-01 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created SPARK-16347:
---

 Summary: DataFrame allows duplicate column-names
 Key: SPARK-16347
 URL: https://issues.apache.org/jira/browse/SPARK-16347
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
 Environment: Databricks community edition
Scala notebook in Google-Chrome
Linux (Ubuntu 14.04LTS)
Reporter: Sanjay Dasgupta


Certain DataFrame APIs allow duplicate column-names. The following code 
illustrates the problem:

case class Row(integer: Int, string1: String, string2: String)
val rows = spark.sparkContext.parallelize(Seq(Row(1, "one", "one"), Row(2, 
"two", "two"), Row(3, "three", "three")))
// DUPLICATED COLUMN-NAMES ...
val df = rows.toDF("integer", "string", "string")
df.printSchema()

Here is the output:

root
 |-- integer: integer (nullable = false)
 |-- string: string (nullable = true)
 |-- string: string (nullable = true)

defined class Row
rows: org.apache.spark.rdd.RDD[Row] = ParallelCollectionRDD[168] at parallelize 
at :39
df: org.apache.spark.sql.DataFrame = [integer: int, string: string ... 1 more 
field]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15964) Assignment to RDD-typed val fails

2016-06-15 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created SPARK-15964:
---

 Summary: Assignment to RDD-typed val fails
 Key: SPARK-15964
 URL: https://issues.apache.org/jira/browse/SPARK-15964
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: Notebook on Databricks Community-Edition 
Spark-2.0 preview
Google Chrome Browser
Linux Ubuntu 14.04 LTS
Reporter: Sanjay Dasgupta


Unusual assignment error, giving the following error message:

found : org.apache.spark.rdd.RDD[Name]
required : org.apache.spark.rdd.RDD[Name]

This occurs when the assignment is attempted in a cell that is different from 
the cell in which the item on the right-hand-side is defined. As in the 
following example:

// CELL-1
import org.apache.spark.sql.Dataset
import org.apache.spark.rdd.RDD

case class Name(number: Int, name: String)
val names = Seq(Name(1, "one"), Name(2, "two"), Name(3, "three"), Name(4, 
"four"))
val dataset: Dataset[Name] = spark.sparkContext.parallelize(names).toDF.as[Name]

// CELL-2
// Error reported here ...
val dataRdd: RDD[Name] = dataset.rdd

The error is reported in CELL-2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15732) Dataset generated code "generated.java" Fails with Certain Case Classes

2016-06-02 Thread Sanjay Dasgupta (JIRA)
Sanjay Dasgupta created SPARK-15732:
---

 Summary: Dataset generated code "generated.java" Fails with 
Certain Case Classes
 Key: SPARK-15732
 URL: https://issues.apache.org/jira/browse/SPARK-15732
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
 Environment: Version 2.0 Preview on the Databricks Community Edition
Reporter: Sanjay Dasgupta


The Dataset code generation logic fails to handle field-names in case classes 
that are also Java keywords (e.g. "abstract"). Scala has an escaping mechanism 
(using backquotes) that allows Java (and Scala) keywords to be used as names in 
programs, as in the example below:

case class PatApp(number: Int, title: String, `abstract`: String)

But this case class trips up the Dataset code generator. The following error 
message is displayed when Datasets containing instances of such case classes 
are processed.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in 
stage 54.0 failed 1 times, most recent failure: Lost task 2.0 in stage 54.0 
(TID 1304, localhost): java.lang.RuntimeException: Error while encoding: 
java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', 
Line 60, Column 84: Unexpected selector 'abstract' after "."

The following code can be used to replicate the problem. This code was run on 
the Databricks CE, in a Scala notebook, in 3 separate cells as shown below:

// CELL 1:
//
// Create a Case Class with "abstract" as a field-name ...
//
package keywordissue
// The field-name abstract is a Java keyword ...
case class PatApp(number: Int, title: String, `abstract`: String)

// CELL 2:
//
// Create a Dataset using the case class ...
//
import keywordissue.PatApp

val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002, 
"1002", "Abstract 1002"), PatApp(1003, "1003", "Abstract for 1003"), PatApp(/* 
Duplicate! */ 1003, "1004", "Abstract 1004"))
val appsDataset = sc.parallelize(applications).toDF.as[PatApp]

// CELL 3:
//
// Force Dataset code-generation. This causes the error message to display ...
//
val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k, 
i.length)).filter(_._2 > 0)
duplicates.collect().foreach(println)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org