[jira] [Created] (ZEPPELIN-3522) String "defaultValue" (instead of boolean) in some "interpreter-settings.json" files
Sanjay Dasgupta created ZEPPELIN-3522: - Summary: String "defaultValue" (instead of boolean) in some "interpreter-settings.json" files Key: ZEPPELIN-3522 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3522 Project: Zeppelin Issue Type: Bug Components: conf, Interpreters Affects Versions: 0.7.3, 0.8.0 Reporter: Sanjay Dasgupta Fix For: 0.8.0, 0.7.4 The _interpreter-settings.json_ file for each interpreter has details of each configurable parameter for that interpreter. Each parameter also has a _defaultValue_ setting. For boolean-typed parameters the _defaultValue_ must be set to _true_ or _false_. But in some of these _interpreter-settings.json_ files, the _defaultValue_ has been set to the string values _"true"_ and _"false"_ (the quote marks are included in the value provided). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZEPPELIN-3493) "Export all data as csv" not exporting all data
Sanjay Dasgupta created ZEPPELIN-3493: - Summary: "Export all data as csv" not exporting all data Key: ZEPPELIN-3493 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3493 Project: Zeppelin Issue Type: Bug Components: front-end, Interpreters Affects Versions: 0.8.0 Reporter: Sanjay Dasgupta Fix For: 0.8.0 The "Export all data as csv" menu item (top right of grid UI) appears to export the same number of records as the "Export visible data as csv". When tested using z.show(...) to display a dataframe containing more than zeppelin.spark.maxResult records, the output exported by both these commands was limited by zeppelin.spark.maxResult, and contained exactly the same number of records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZEPPELIN-3459) Passing Z variables to Markdown interpreter
Sanjay Dasgupta created ZEPPELIN-3459: - Summary: Passing Z variables to Markdown interpreter Key: ZEPPELIN-3459 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3459 Project: Zeppelin Issue Type: New Feature Components: conf, documentation, Interpreters Affects Versions: 0.7.3, 0.8.0 Reporter: Sanjay Dasgupta Assignee: Sanjay Dasgupta Fix For: 0.7.4, 0.9.0, 0.8.1 This issue documents the interpolation of ZeppelinContext objects into the paragraph text of BigQuery cells. It is a child of the umbrella issue ZEPPELIN-3342 (Passing Z variables to ALL interpreters) and a grandchild of the issue ZEPPELIN-1967. The implementation will take the same approach that was followed in [PR-2898|https://github.com/apache/zeppelin/pull/2898] and [PR-2903|https://github.com/apache/zeppelin/pull/2903]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZEPPELIN-3438) Passing Z variables to BigQuery interpreter
Sanjay Dasgupta created ZEPPELIN-3438: - Summary: Passing Z variables to BigQuery interpreter Key: ZEPPELIN-3438 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3438 Project: Zeppelin Issue Type: New Feature Components: conf, documentation, Interpreters Affects Versions: 0.8.0, 0.7.4, 0.9.0 Reporter: Sanjay Dasgupta This issue documents the interpolation of ZeppelinContext objects into the paragraph text of BigQuery cells. It is a child of the umbrella issue ZEPPELIN-3342 (Passing Z variables to ALL interpreters) and a grandchild of the issue ZEPPELIN-1967. The implementation will take the same approach that was followed in [PR-2898|https://github.com/apache/zeppelin/pull/2898] and [PR-2903|https://github.com/apache/zeppelin/pull/2903]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZEPPELIN-3388) Refactor documentation for ZeppelinContext
Sanjay Dasgupta created ZEPPELIN-3388: - Summary: Refactor documentation for ZeppelinContext Key: ZEPPELIN-3388 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3388 Project: Zeppelin Issue Type: Improvement Components: documentation Affects Versions: 0.8.0, 0.7.4, 0.9.0 Reporter: Sanjay Dasgupta The description of ZepplinContext is now almost entirely contained within the Spark interpreter's documentation ([spark.md|https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md]). But ZepplinContext has many generic features that are available to all interpreters, and it is important for ZepplinContext to have a more visible and independent presence in the Zeppelin documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZEPPELIN-3383) ZeppelinContext Get-Form-Input-Data method
Sanjay Dasgupta created ZEPPELIN-3383: - Summary: ZeppelinContext Get-Form-Input-Data method Key: ZEPPELIN-3383 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3383 Project: Zeppelin Issue Type: New Feature Components: documentation, GUI Reporter: Sanjay Dasgupta There have been requests for a method to enable programmatic access to form input data (e.g. [Zeppelin-425|https://issues.apache.org/jira/browse/ZEPPELIN-425]). It is proposed to augment ZeppelinContext with the following method to allow such access: {{z.getFormInput("var-name")}} The availability of _getFormInput()_ will also allow form inputs to be used globally across the notebook (also often requested, e.g. [Zeppelin-1680|https://issues.apache.org/jira/browse/ZEPPELIN-1680]) by creating a Z variable with the same name: {{z.put("var-name", z.getFormInput("var-name"))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZEPPELIN-3377) Passing Z variables to JDBC interpreter
Sanjay Dasgupta created ZEPPELIN-3377: - Summary: Passing Z variables to JDBC interpreter Key: ZEPPELIN-3377 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3377 Project: Zeppelin Issue Type: New Feature Components: conf, documentation, Interpreters Affects Versions: 0.8.0, 0.7.4, 0.9.0 Reporter: Sanjay Dasgupta This issue documents the interpolation of ZeppelinContext objects into the paragraph text of JDBC cells. It is a child of the umbrella issue ZEPPELIN-3342 (Passing Z variables to ALL interpreters) and a grandchild of the issue ZEPPELIN-1967. The implementation will take the same approach that was followed in [PR-2898|https://github.com/apache/zeppelin/pull/2898]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZEPPELIN-3342) Passing Z variables to ALL interpreters
Sanjay Dasgupta created ZEPPELIN-3342: - Summary: Passing Z variables to ALL interpreters Key: ZEPPELIN-3342 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3342 Project: Zeppelin Issue Type: New Feature Components: Interpreters Affects Versions: 0.8.0, 0.9.0 Reporter: Sanjay Dasgupta This is a follow on issue to ZEPPELIN-1967 (Passing Z variables to Shell and SQL Interpreters). It envisages the extension of the functionality in ZEPPELIN-1967 to all Zeppelin interpreters. An examination of the source code of the Zeppelin interpreters shows that the same functionality can be extended to all interpreters (with a few exceptions) by making simple changes in just one or two lines in/around the _interpret_ method of each _Interpreter_ sub-class. The implementation approach can be seen in the [PR-2834|https://github.com/apache/zeppelin/pull/2834] associated with ZEPPELIN-1967. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZEPPELIN-2849) Passing Z variables to SHELL Interpreter (One part of ZEPPELIN-1967)
Sanjay Dasgupta created ZEPPELIN-2849: - Summary: Passing Z variables to SHELL Interpreter (One part of ZEPPELIN-1967) Key: ZEPPELIN-2849 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2849 Project: Zeppelin Issue Type: New Feature Components: Interpreters Affects Versions: 0.7.0, 0.8.0 Reporter: Sanjay Dasgupta The issue https://issues.apache.org/jira/browse/ZEPPELIN-1967 requests implementation of the same functionality in different interpreters (and in different interpreter groups). But it may be simpler to implement the function separately in each interpreter of each group. This issue has been created to accompany an implementation for the SHELL interpreter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZEPPELIN-2807) Passing Z variables to SQL Interpreter (One part of ZEPPLIN-1967)
Sanjay Dasgupta created ZEPPELIN-2807: - Summary: Passing Z variables to SQL Interpreter (One part of ZEPPLIN-1967) Key: ZEPPELIN-2807 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2807 Project: Zeppelin Issue Type: New Feature Components: Interpreters Affects Versions: 0.7.0, 0.8.0 Reporter: Sanjay Dasgupta The issue https://issues.apache.org/jira/browse/ZEPPELIN-1967 requests implementation of the same functionality in different interpreters (and in different interpreter groups). But it may be simpler to implement the function separately in each interpreter of each group. This issue has been created to accompany an implementation for the Spark SQL interpreter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SPARK-19034) Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2
[ https://issues.apache.org/jira/browse/SPARK-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787426#comment-15787426 ] Sanjay Dasgupta commented on SPARK-19034: - Yes, the SPARK_HOME was the issue. Apologies for the confusion. > Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2 > > > Key: SPARK-19034 > URL: https://issues.apache.org/jira/browse/SPARK-19034 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.1.0 > Environment: All >Reporter: Sanjay Dasgupta > Labels: distribution, download > > Download packages on 'https://spark.apache.org/downloads.html' have the right > name ( spark-2.1.0-bin-...) but contain the release 2.0.2 software -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19034) Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2
[ https://issues.apache.org/jira/browse/SPARK-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787390#comment-15787390 ] Sanjay Dasgupta commented on SPARK-19034: - The "Direct download" link to the "Pre-built for Hadoop 2.4" package is the following: http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.4.tgz When I run the "spark-shell" from this package it clearly announces itself as "version 2.0.2". Running "spark.version" in the REPL also produces "res0: String = 2.0.2" > Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2 > > > Key: SPARK-19034 > URL: https://issues.apache.org/jira/browse/SPARK-19034 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.1.0 > Environment: All >Reporter: Sanjay Dasgupta > Labels: distribution, download > > Download packages on 'https://spark.apache.org/downloads.html' have the right > name ( spark-2.1.0-bin-...) but contain the release 2.0.2 software -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19034) Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2
Sanjay Dasgupta created SPARK-19034: --- Summary: Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2 Key: SPARK-19034 URL: https://issues.apache.org/jira/browse/SPARK-19034 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.1.0 Environment: All Reporter: Sanjay Dasgupta Download packages on 'https://spark.apache.org/downloads.html' have the right name ( spark-2.1.0-bin-...) but contain the release 2.0.2 software -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16347) DataFrame allows duplicate column-names
Sanjay Dasgupta created SPARK-16347: --- Summary: DataFrame allows duplicate column-names Key: SPARK-16347 URL: https://issues.apache.org/jira/browse/SPARK-16347 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Environment: Databricks community edition Scala notebook in Google-Chrome Linux (Ubuntu 14.04LTS) Reporter: Sanjay Dasgupta Certain DataFrame APIs allow duplicate column-names. The following code illustrates the problem: case class Row(integer: Int, string1: String, string2: String) val rows = spark.sparkContext.parallelize(Seq(Row(1, "one", "one"), Row(2, "two", "two"), Row(3, "three", "three"))) // DUPLICATED COLUMN-NAMES ... val df = rows.toDF("integer", "string", "string") df.printSchema() Here is the output: root |-- integer: integer (nullable = false) |-- string: string (nullable = true) |-- string: string (nullable = true) defined class Row rows: org.apache.spark.rdd.RDD[Row] = ParallelCollectionRDD[168] at parallelize at :39 df: org.apache.spark.sql.DataFrame = [integer: int, string: string ... 1 more field] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15964) Assignment to RDD-typed val fails
Sanjay Dasgupta created SPARK-15964: --- Summary: Assignment to RDD-typed val fails Key: SPARK-15964 URL: https://issues.apache.org/jira/browse/SPARK-15964 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Environment: Notebook on Databricks Community-Edition Spark-2.0 preview Google Chrome Browser Linux Ubuntu 14.04 LTS Reporter: Sanjay Dasgupta Unusual assignment error, giving the following error message: found : org.apache.spark.rdd.RDD[Name] required : org.apache.spark.rdd.RDD[Name] This occurs when the assignment is attempted in a cell that is different from the cell in which the item on the right-hand-side is defined. As in the following example: // CELL-1 import org.apache.spark.sql.Dataset import org.apache.spark.rdd.RDD case class Name(number: Int, name: String) val names = Seq(Name(1, "one"), Name(2, "two"), Name(3, "three"), Name(4, "four")) val dataset: Dataset[Name] = spark.sparkContext.parallelize(names).toDF.as[Name] // CELL-2 // Error reported here ... val dataRdd: RDD[Name] = dataset.rdd The error is reported in CELL-2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15732) Dataset generated code "generated.java" Fails with Certain Case Classes
Sanjay Dasgupta created SPARK-15732: --- Summary: Dataset generated code "generated.java" Fails with Certain Case Classes Key: SPARK-15732 URL: https://issues.apache.org/jira/browse/SPARK-15732 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Environment: Version 2.0 Preview on the Databricks Community Edition Reporter: Sanjay Dasgupta The Dataset code generation logic fails to handle field-names in case classes that are also Java keywords (e.g. "abstract"). Scala has an escaping mechanism (using backquotes) that allows Java (and Scala) keywords to be used as names in programs, as in the example below: case class PatApp(number: Int, title: String, `abstract`: String) But this case class trips up the Dataset code generator. The following error message is displayed when Datasets containing instances of such case classes are processed. org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 54.0 failed 1 times, most recent failure: Lost task 2.0 in stage 54.0 (TID 1304, localhost): java.lang.RuntimeException: Error while encoding: java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 60, Column 84: Unexpected selector 'abstract' after "." The following code can be used to replicate the problem. This code was run on the Databricks CE, in a Scala notebook, in 3 separate cells as shown below: // CELL 1: // // Create a Case Class with "abstract" as a field-name ... // package keywordissue // The field-name abstract is a Java keyword ... case class PatApp(number: Int, title: String, `abstract`: String) // CELL 2: // // Create a Dataset using the case class ... // import keywordissue.PatApp val applications = List(PatApp(1001, "1001", "Abstract 1001"), PatApp(1002, "1002", "Abstract 1002"), PatApp(1003, "1003", "Abstract for 1003"), PatApp(/* Duplicate! */ 1003, "1004", "Abstract 1004")) val appsDataset = sc.parallelize(applications).toDF.as[PatApp] // CELL 3: // // Force Dataset code-generation. This causes the error message to display ... // val duplicates = appsDataset.groupByKey(_.number).mapGroups((k, i) => (k, i.length)).filter(_._2 > 0) duplicates.collect().foreach(println) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org