[jira] [Commented] (SPARK-15266) Use SparkSession instead of SQLContext in Python tests

2016-05-10 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279594#comment-15279594
 ] 

Sandeep Singh commented on SPARK-15266:
---

np.

> Use SparkSession instead of SQLContext in Python tests
> --
>
> Key: SPARK-15266
> URL: https://issues.apache.org/jira/browse/SPARK-15266
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently doc tests and tests in Python are using old {{SQLContext}}. This 
> would better be changed to {{SparkSession}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-15266) Use SparkSession instead of SQLContext in Python tests

2016-05-10 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon closed SPARK-15266.

Resolution: Duplicate

> Use SparkSession instead of SQLContext in Python tests
> --
>
> Key: SPARK-15266
> URL: https://issues.apache.org/jira/browse/SPARK-15266
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently doc tests and tests in Python are using old {{SQLContext}}. This 
> would better be changed to {{SparkSession}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15266) Use SparkSession instead of SQLContext in Python tests

2016-05-10 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279591#comment-15279591
 ] 

Hyukjin Kwon commented on SPARK-15266:
--

Oh, sorry. I could not find. Thank you for correcting me.

> Use SparkSession instead of SQLContext in Python tests
> --
>
> Key: SPARK-15266
> URL: https://issues.apache.org/jira/browse/SPARK-15266
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently doc tests and tests in Python are using old {{SQLContext}}. This 
> would better be changed to {{SparkSession}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15266) Use SparkSession instead of SQLContext in Python tests

2016-05-10 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279581#comment-15279581
 ] 

Sandeep Singh edited comment on SPARK-15266 at 5/11/16 5:46 AM:


This is a duplicate of https://issues.apache.org/jira/browse/SPARK-15037
And I've written the python code for this as part of 15037, but since PR was 
big will create a new PR(https://github.com/apache/spark/pull/13044) for it.
Thanks


was (Author: techaddict):
This is a duplicate of https://issues.apache.org/jira/browse/SPARK-15037
And I've written the python code for this as part of 15037, but since PR was 
big will create a new PR for it.
Thanks

> Use SparkSession instead of SQLContext in Python tests
> --
>
> Key: SPARK-15266
> URL: https://issues.apache.org/jira/browse/SPARK-15266
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently doc tests and tests in Python are using old {{SQLContext}}. This 
> would better be changed to {{SparkSession}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279586#comment-15279586
 ] 

Apache Spark commented on SPARK-15037:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/13044

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Dongjoon Hyun
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15266) Use SparkSession instead of SQLContext in Python tests

2016-05-10 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279581#comment-15279581
 ] 

Sandeep Singh commented on SPARK-15266:
---

This is a duplicate of https://issues.apache.org/jira/browse/SPARK-15037
And I've written the python code for this as part of 15037, but since PR was 
big will create a new PR for it.
Thanks

> Use SparkSession instead of SQLContext in Python tests
> --
>
> Key: SPARK-15266
> URL: https://issues.apache.org/jira/browse/SPARK-15266
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently doc tests and tests in Python are using old {{SQLContext}}. This 
> would better be changed to {{SparkSession}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15235) Corresponding row cannot be highlighted even though cursor is on the job on Web UI's timeline

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15235.
-
   Resolution: Fixed
 Assignee: Kousuke Saruta
Fix Version/s: 2.0.0

> Corresponding row cannot be highlighted even though cursor is on the job on 
> Web UI's timeline
> -
>
> Key: SPARK-15235
> URL: https://issues.apache.org/jira/browse/SPARK-15235
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Trivial
> Fix For: 2.0.0
>
>
> To extract job descriptions and stage name, there are following regular 
> expressions in timeline-view.js
> {code}
> var jobIdText = 
> $($(baseElem).find(".application-timeline-content")[0]).text();
> var jobId = jobIdText.match("\\(Job (\\d+)\\)")[1];
> ...
> var stageIdText = $($(baseElem).find(".job-timeline-content")[0]).text();
> var stageIdAndAttempt = stageIdText.match("\\(Stage 
> (\\d+\\.\\d+)\\)")[1].split(".");
> {code}
> But if job descriptions include patterns like "(Job x)" or stage names 
> include patterns like "(Stage x.y)", the regular expressions cannot be match 
> as we expected, ending up with corresponding row cannot be highlighted even 
> though we move the cursor onto the job on Web UI's timeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15266) Use SparkSession instead of SQLContext in Python tests

2016-05-10 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-15266:
-
Affects Version/s: 2.0.0

> Use SparkSession instead of SQLContext in Python tests
> --
>
> Key: SPARK-15266
> URL: https://issues.apache.org/jira/browse/SPARK-15266
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently doc tests and tests in Python are using old {{SQLContext}}. This 
> would better be changed to {{SparkSession}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15266) Use SparkSession instead of SQLContext in Python tests

2016-05-10 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279575#comment-15279575
 ] 

Hyukjin Kwon commented on SPARK-15266:
--

I will work on this.

> Use SparkSession instead of SQLContext in Python tests
> --
>
> Key: SPARK-15266
> URL: https://issues.apache.org/jira/browse/SPARK-15266
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently doc tests and tests in Python are using old {{SQLContext}}. This 
> would better be changed to {{SparkSession}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15266) Use SparkSession instead of SQLContext in Python tests

2016-05-10 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-15266:


 Summary: Use SparkSession instead of SQLContext in Python tests
 Key: SPARK-15266
 URL: https://issues.apache.org/jira/browse/SPARK-15266
 Project: Spark
  Issue Type: Test
  Components: PySpark
Reporter: Hyukjin Kwon


Currently doc tests and tests in Python are using old {{SQLContext}}. This 
would better be changed to {{SparkSession}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15246) Fix code style and improve volatile for SPARK-4452

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15246.
-
   Resolution: Fixed
 Assignee: Lianhui Wang
Fix Version/s: 2.0.0

>  Fix code style and improve volatile for SPARK-4452
> ---
>
> Key: SPARK-15246
> URL: https://issues.apache.org/jira/browse/SPARK-15246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Lianhui Wang
>Assignee: Lianhui Wang
> Fix For: 2.0.0
>
>
> for SPARK-4452
> 1. Fix code style
> 2. remote volatile of elementsRead method because there is only one thread to 
> use it.
> 3. avoid volatile of _elementsRead because Collection increases number of 
> _elementsRead when it insert a element. It is very expensive. So we can avoid 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15255) RDD name from DataFrame op should not include full local relation data

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15255.
-
   Resolution: Fixed
 Assignee: Davies Liu
Fix Version/s: 2.0.0

> RDD name from DataFrame op should not include full local relation data
> --
>
> Key: SPARK-15255
> URL: https://issues.apache.org/jira/browse/SPARK-15255
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Davies Liu
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently, if you create a DataFrame from local data, do some operations with 
> it, and cache it, then the name of the RDD in the "Storage" tab in the Spark 
> UI will contain the entire local relation's data.  This is not scalable and 
> can cause the browser to become unresponsive.
> I'd propose there be a limit on the size of the data to display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15265) Fix Union query error message indentation

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15265.
-
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.0.0

> Fix Union query error message indentation
> -
>
> Key: SPARK-15265
> URL: https://issues.apache.org/jira/browse/SPARK-15265
> Project: Spark
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
> Fix For: 2.0.0
>
>
> This issue fixes the error message indentation consistently with other set 
> queries (EXCEPT/INTERSECT).
> **Before (4 lines)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException:
> Unions can only be performed on tables with the same number of columns,
>  but one table has '2' columns and another table has
>  '1' columns;
> {code}
> **After (one-line)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Unions can only be performed on 
> tables with the same number of columns, but one table has '2' columns and 
> another table has '1' columns;
> {code}
> **Reference**
> EXCEPT / INTERSECT uses one-line format like the following.
> {code}
> scala> sql("(select 1) intersect (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Intersect can only be performed on 
> tables with the same number of columns, but the left table has 1 columns and 
> the right has 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15263) Make shuffle service dir cleanup faster by using `rm -rf`

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15263:

Target Version/s: 2.0.0

> Make shuffle service dir cleanup faster by using `rm -rf`
> -
>
> Key: SPARK-15263
> URL: https://issues.apache.org/jira/browse/SPARK-15263
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1
>Reporter: Tejas Patil
>Priority: Minor
>
> The current logic for directory cleanup (JavaUtils. deleteRecursively) is 
> slow because it does directory listing, recurses over child directories, 
> checks for symbolic links, deletes leaf files and finally deletes the dirs 
> when they are empty. There is back-and-forth switching from kernel space to 
> user space while doing this. Since most of the deployment backends would be 
> Unix systems, we could essentially just do rm -rf so that entire deletion 
> logic runs in kernel space.
> The current Java based impl in Spark seems to be similar to what standard 
> libraries like guava and commons IO do (eg. 
> http://svn.apache.org/viewvc/commons/proper/io/trunk/src/main/java/org/apache/commons/io/FileUtils.java?view=markup#l1540).
>  However, guava removed this method in favour of shelling out to an operating 
> system command (which is exactly what I am proposing). See the Deprecated 
> note in older javadocs for guava for details : 
> http://google.github.io/guava/releases/10.0.1/api/docs/com/google/common/io/Files.html#deleteRecursively(java.io.File)
> Ideally, Java should be providing such APIs so that users won't have to do 
> such things to get platform specific code. Also, its not just about speed, 
> but also handling race conditions while doing at FS deletions is tricky. I 
> could find this bug for Java in similar context : 
> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7148952



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15250) Remove deprecated json API in DataFrameReader

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15250:

Description: 
I see json() API in DataFrameReader should be removed as below:

{code}
  // TODO: Remove this one in Spark 2.0.
  def json(path: String): DataFrame = format("json").load(path)
{code}

It seems this is because another {{json()}} API below covers the above

{code}
 def json(paths: String*): DataFrame = format("json").load(paths : _*)
{code}



  was:
I see {{json() }}API in {{DataFrameReader}} should be removed as below:

{code}
  // TODO: Remove this one in Spark 2.0.
  def json(path: String): DataFrame = format("json").load(path)
{code}

It seems this is because another {{json()}} API below covers the above

{code}
 def json(paths: String*): DataFrame = format("json").load(paths : _*)
{code}




> Remove deprecated json API in DataFrameReader
> -
>
> Key: SPARK-15250
> URL: https://issues.apache.org/jira/browse/SPARK-15250
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.0
>
>
> I see json() API in DataFrameReader should be removed as below:
> {code}
>   // TODO: Remove this one in Spark 2.0.
>   def json(path: String): DataFrame = format("json").load(path)
> {code}
> It seems this is because another {{json()}} API below covers the above
> {code}
>  def json(paths: String*): DataFrame = format("json").load(paths : _*)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15250) Remove deprecated json API in DataFrameReader

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15250:

Description: 
I see json() API in DataFrameReader should be removed as below:

{code}
  // TODO: Remove this one in Spark 2.0.
  def json(path: String): DataFrame = format("json").load(path)
{code}

It seems this is because another json() API below covers the above

{code}
 def json(paths: String*): DataFrame = format("json").load(paths : _*)
{code}



  was:
I see json() API in DataFrameReader should be removed as below:

{code}
  // TODO: Remove this one in Spark 2.0.
  def json(path: String): DataFrame = format("json").load(path)
{code}

It seems this is because another {{json()}} API below covers the above

{code}
 def json(paths: String*): DataFrame = format("json").load(paths : _*)
{code}




> Remove deprecated json API in DataFrameReader
> -
>
> Key: SPARK-15250
> URL: https://issues.apache.org/jira/browse/SPARK-15250
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.0
>
>
> I see json() API in DataFrameReader should be removed as below:
> {code}
>   // TODO: Remove this one in Spark 2.0.
>   def json(path: String): DataFrame = format("json").load(path)
> {code}
> It seems this is because another json() API below covers the above
> {code}
>  def json(paths: String*): DataFrame = format("json").load(paths : _*)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15250) Remove deprecated json API in DataFrameReader

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15250.
-
   Resolution: Fixed
 Assignee: Hyukjin Kwon
Fix Version/s: 2.0.0

> Remove deprecated json API in DataFrameReader
> -
>
> Key: SPARK-15250
> URL: https://issues.apache.org/jira/browse/SPARK-15250
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.0
>
>
> I see {{json() }}API in {{DataFrameReader}} should be removed as below:
> {code}
>   // TODO: Remove this one in Spark 2.0.
>   def json(path: String): DataFrame = format("json").load(path)
> {code}
> It seems this is because another {{json()}} API below covers the above
> {code}
>  def json(paths: String*): DataFrame = format("json").load(paths : _*)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15261) Remove experimental tag from DataFrameReader and DataFrameWriter

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15261.
-
Resolution: Fixed

> Remove experimental tag from DataFrameReader and DataFrameWriter
> 
>
> Key: SPARK-15261
> URL: https://issues.apache.org/jira/browse/SPARK-15261
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> We should tag streaming related stuff with experimental, and remove the rest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14476) Show table name or path in string of DataSourceScan

2016-05-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14476.
-
   Resolution: Fixed
 Assignee: Sean Zhong  (was: Cheng Lian)
Fix Version/s: 2.0.0

> Show table name or path in string of DataSourceScan
> ---
>
> Key: SPARK-14476
> URL: https://issues.apache.org/jira/browse/SPARK-14476
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Sean Zhong
>Priority: Critical
> Fix For: 2.0.0
>
>
> right now, the string of DataSourceScan is only "HadoopFiles xxx", without 
> any information about the table name or path. 
> Since we have that in 1.6, this is kind of regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15265) Fix Union query error message indentation

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15265:


Assignee: (was: Apache Spark)

> Fix Union query error message indentation
> -
>
> Key: SPARK-15265
> URL: https://issues.apache.org/jira/browse/SPARK-15265
> Project: Spark
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> This issue fixes the error message indentation consistently with other set 
> queries (EXCEPT/INTERSECT).
> **Before (4 lines)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException:
> Unions can only be performed on tables with the same number of columns,
>  but one table has '2' columns and another table has
>  '1' columns;
> {code}
> **After (one-line)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Unions can only be performed on 
> tables with the same number of columns, but one table has '2' columns and 
> another table has '1' columns;
> {code}
> **Reference**
> EXCEPT / INTERSECT uses one-line format like the following.
> {code}
> scala> sql("(select 1) intersect (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Intersect can only be performed on 
> tables with the same number of columns, but the left table has 1 columns and 
> the right has 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15265) Fix Union query error message indentation

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279436#comment-15279436
 ] 

Apache Spark commented on SPARK-15265:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/13043

> Fix Union query error message indentation
> -
>
> Key: SPARK-15265
> URL: https://issues.apache.org/jira/browse/SPARK-15265
> Project: Spark
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> This issue fixes the error message indentation consistently with other set 
> queries (EXCEPT/INTERSECT).
> **Before (4 lines)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException:
> Unions can only be performed on tables with the same number of columns,
>  but one table has '2' columns and another table has
>  '1' columns;
> {code}
> **After (one-line)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Unions can only be performed on 
> tables with the same number of columns, but one table has '2' columns and 
> another table has '1' columns;
> {code}
> **Reference**
> EXCEPT / INTERSECT uses one-line format like the following.
> {code}
> scala> sql("(select 1) intersect (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Intersect can only be performed on 
> tables with the same number of columns, but the left table has 1 columns and 
> the right has 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15265) Fix Union query error message indentation

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15265:


Assignee: Apache Spark

> Fix Union query error message indentation
> -
>
> Key: SPARK-15265
> URL: https://issues.apache.org/jira/browse/SPARK-15265
> Project: Spark
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Trivial
>
> This issue fixes the error message indentation consistently with other set 
> queries (EXCEPT/INTERSECT).
> **Before (4 lines)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException:
> Unions can only be performed on tables with the same number of columns,
>  but one table has '2' columns and another table has
>  '1' columns;
> {code}
> **After (one-line)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Unions can only be performed on 
> tables with the same number of columns, but one table has '2' columns and 
> another table has '1' columns;
> {code}
> **Reference**
> EXCEPT / INTERSECT uses one-line format like the following.
> {code}
> scala> sql("(select 1) intersect (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Intersect can only be performed on 
> tables with the same number of columns, but the left table has 1 columns and 
> the right has 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15265) Fix Union query error message indentation

2016-05-10 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-15265:
--
Description: 
This issue fixes the error message indentation consistently with other set 
queries (EXCEPT/INTERSECT).

**Before (4 lines)**
{code}
scala> sql("(select 1) union (select 1, 2)").head
org.apache.spark.sql.AnalysisException:
Unions can only be performed on tables with the same number of columns,
 but one table has '2' columns and another table has
 '1' columns;
{code}

**After (one-line)**
{code}
scala> sql("(select 1) union (select 1, 2)").head
org.apache.spark.sql.AnalysisException: Unions can only be performed on tables 
with the same number of columns, but one table has '2' columns and another 
table has '1' columns;
{code}

**Reference**
EXCEPT / INTERSECT uses one-line format like the following.
{code}
scala> sql("(select 1) intersect (select 1, 2)").head
org.apache.spark.sql.AnalysisException: Intersect can only be performed on 
tables with the same number of columns, but the left table has 1 columns and 
the right has 2;
{code}


  was:
This issue fixes the error message indentation consistently with other set 
queries (EXCEPT/INTERSECT).

**Before**
{code}
scala> sql("(select 1) union (select 1, 2)").head
org.apache.spark.sql.AnalysisException:
Unions can only be performed on tables with the same number of columns,
 but one table has '2' columns and another table has
 '1' columns;
{code}

**After**
{code}
scala> sql("(select 1) union (select 1, 2)").head
org.apache.spark.sql.AnalysisException: Unions can only be performed on tables 
with the same number of columns, but one table has '2' columns and another 
table has '1' columns;
{code}

**Reference**
EXCEPT / INTERSECT uses one-line format like the following.
{code}
scala> sql("(select 1) intersect (select 1, 2)").head
org.apache.spark.sql.AnalysisException: Intersect can only be performed on 
tables with the same number of columns, but the left table has 1 columns and 
the right has 2;
{code}



> Fix Union query error message indentation
> -
>
> Key: SPARK-15265
> URL: https://issues.apache.org/jira/browse/SPARK-15265
> Project: Spark
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> This issue fixes the error message indentation consistently with other set 
> queries (EXCEPT/INTERSECT).
> **Before (4 lines)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException:
> Unions can only be performed on tables with the same number of columns,
>  but one table has '2' columns and another table has
>  '1' columns;
> {code}
> **After (one-line)**
> {code}
> scala> sql("(select 1) union (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Unions can only be performed on 
> tables with the same number of columns, but one table has '2' columns and 
> another table has '1' columns;
> {code}
> **Reference**
> EXCEPT / INTERSECT uses one-line format like the following.
> {code}
> scala> sql("(select 1) intersect (select 1, 2)").head
> org.apache.spark.sql.AnalysisException: Intersect can only be performed on 
> tables with the same number of columns, but the left table has 1 columns and 
> the right has 2;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15265) Fix Union query error message indentation

2016-05-10 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-15265:
-

 Summary: Fix Union query error message indentation
 Key: SPARK-15265
 URL: https://issues.apache.org/jira/browse/SPARK-15265
 Project: Spark
  Issue Type: Bug
Reporter: Dongjoon Hyun
Priority: Trivial


This issue fixes the error message indentation consistently with other set 
queries (EXCEPT/INTERSECT).

**Before**
{code}
scala> sql("(select 1) union (select 1, 2)").head
org.apache.spark.sql.AnalysisException:
Unions can only be performed on tables with the same number of columns,
 but one table has '2' columns and another table has
 '1' columns;
{code}

**After**
{code}
scala> sql("(select 1) union (select 1, 2)").head
org.apache.spark.sql.AnalysisException: Unions can only be performed on tables 
with the same number of columns, but one table has '2' columns and another 
table has '1' columns;
{code}

**Reference**
EXCEPT / INTERSECT uses one-line format like the following.
{code}
scala> sql("(select 1) intersect (select 1, 2)").head
org.apache.spark.sql.AnalysisException: Intersect can only be performed on 
tables with the same number of columns, but the left table has 1 columns and 
the right has 2;
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15263) Make shuffle service dir cleanup faster by using `rm -rf`

2016-05-10 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated SPARK-15263:

Description: 
The current logic for directory cleanup (JavaUtils. deleteRecursively) is slow 
because it does directory listing, recurses over child directories, checks for 
symbolic links, deletes leaf files and finally deletes the dirs when they are 
empty. There is back-and-forth switching from kernel space to user space while 
doing this. Since most of the deployment backends would be Unix systems, we 
could essentially just do rm -rf so that entire deletion logic runs in kernel 
space.

The current Java based impl in Spark seems to be similar to what standard 
libraries like guava and commons IO do (eg. 
http://svn.apache.org/viewvc/commons/proper/io/trunk/src/main/java/org/apache/commons/io/FileUtils.java?view=markup#l1540).
 However, guava removed this method in favour of shelling out to an operating 
system command (which is exactly what I am proposing). See the Deprecated note 
in older javadocs for guava for details : 
http://google.github.io/guava/releases/10.0.1/api/docs/com/google/common/io/Files.html#deleteRecursively(java.io.File)

Ideally, Java should be providing such APIs so that users won't have to do such 
things to get platform specific code. Also, its not just about speed, but also 
handling race conditions while doing at FS deletions is tricky. I could find 
this bug for Java in similar context : 
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7148952

> Make shuffle service dir cleanup faster by using `rm -rf`
> -
>
> Key: SPARK-15263
> URL: https://issues.apache.org/jira/browse/SPARK-15263
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1
>Reporter: Tejas Patil
>Priority: Minor
>
> The current logic for directory cleanup (JavaUtils. deleteRecursively) is 
> slow because it does directory listing, recurses over child directories, 
> checks for symbolic links, deletes leaf files and finally deletes the dirs 
> when they are empty. There is back-and-forth switching from kernel space to 
> user space while doing this. Since most of the deployment backends would be 
> Unix systems, we could essentially just do rm -rf so that entire deletion 
> logic runs in kernel space.
> The current Java based impl in Spark seems to be similar to what standard 
> libraries like guava and commons IO do (eg. 
> http://svn.apache.org/viewvc/commons/proper/io/trunk/src/main/java/org/apache/commons/io/FileUtils.java?view=markup#l1540).
>  However, guava removed this method in favour of shelling out to an operating 
> system command (which is exactly what I am proposing). See the Deprecated 
> note in older javadocs for guava for details : 
> http://google.github.io/guava/releases/10.0.1/api/docs/com/google/common/io/Files.html#deleteRecursively(java.io.File)
> Ideally, Java should be providing such APIs so that users won't have to do 
> such things to get platform specific code. Also, its not just about speed, 
> but also handling race conditions while doing at FS deletions is tricky. I 
> could find this bug for Java in similar context : 
> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7148952



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15263) Make shuffle service dir cleanup faster by using `rm -rf`

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279391#comment-15279391
 ] 

Apache Spark commented on SPARK-15263:
--

User 'tejasapatil' has created a pull request for this issue:
https://github.com/apache/spark/pull/13042

> Make shuffle service dir cleanup faster by using `rm -rf`
> -
>
> Key: SPARK-15263
> URL: https://issues.apache.org/jira/browse/SPARK-15263
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1
>Reporter: Tejas Patil
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15263) Make shuffle service dir cleanup faster by using `rm -rf`

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15263:


Assignee: Apache Spark

> Make shuffle service dir cleanup faster by using `rm -rf`
> -
>
> Key: SPARK-15263
> URL: https://issues.apache.org/jira/browse/SPARK-15263
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1
>Reporter: Tejas Patil
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15263) Make shuffle service dir cleanup faster by using `rm -rf`

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15263:


Assignee: (was: Apache Spark)

> Make shuffle service dir cleanup faster by using `rm -rf`
> -
>
> Key: SPARK-15263
> URL: https://issues.apache.org/jira/browse/SPARK-15263
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.6.1
>Reporter: Tejas Patil
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15224) Can not delete jar and list jar in spark Thrift server

2016-05-10 Thread poseidon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279386#comment-15279386
 ] 

poseidon commented on SPARK-15224:
--

well,It's very obvious, the exception say that it's not a valid syntax. But, in 
origin hive sql, it's valid, and works well. 
After we add jar to thrift server , every sql will depend on this jar , and 
every executor will add this dependency when executor start. 
if we can not delete jar, and know how many jars we have load. Thrif-sever will 
be a very fat server after running for a while. 


> Can not delete jar and list jar in spark Thrift server
> --
>
> Key: SPARK-15224
> URL: https://issues.apache.org/jira/browse/SPARK-15224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
> Environment: spark 1.6.1
> hive 1.2.1 
> hdfs 2.7.1 
>Reporter: poseidon
>Priority: Minor
>
> when you try to delete jar , and exec delete jar  or list jar in you 
> beeline client. it throws exception
> delete jar; 
> Error: org.apache.spark.sql.AnalysisException: line 1:7 missing FROM at 
> 'jars' near 'jars'
> line 1:12 missing EOF at 'myudfs' near 'jars'; (state=,code=0)
> list jar;
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> 'list' 'jars' ''; line 1 pos 0 (state=,code=0)
> {code:title=funnlog.log|borderStyle=solid}
> 16/05/09 17:26:52 INFO thriftserver.SparkExecuteStatementOperation: Running 
> query 'list jar' with 1da09765-efb4-42dc-8890-3defca40f89d
> 16/05/09 17:26:52 INFO parse.ParseDriver: Parsing command: list jar
> NoViableAltException(26@[])
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1071)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.spark.sql.hive.HiveQl$.getAst(HiveQl.scala:276)
>   at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:303)
>   at 
> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
>   at 
> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
>   at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
>   at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
>   at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
>   at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
>   at 
> scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
>   at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
>   at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
>   at 
> scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
>   at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
>   at 
> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
>   at 
> scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
>   at 
> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
>   at 
> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>   at 
> scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
>   at 
> scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
>   at 
> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
>   at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:295)
>   at 
> org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:66)
>   at 
> org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:66)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:293)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:240)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:239)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:282)
>   at org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:65)
>   at 
> org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
>   at 
> 

[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-10 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279380#comment-15279380
 ] 

holdenk commented on SPARK-14813:
-

redid the links

> ML 2.0 QA: API: Python API coverage
> ---
>
> Key: SPARK-14813
> URL: https://issues.apache.org/jira/browse/SPARK-14813
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, PySpark
>Reporter: Joseph K. Bradley
>Assignee: holdenk
>
> For new public APIs added to MLlib, we need to check the generated HTML doc 
> and compare the Scala & Python versions.  We need to track:
> * Inconsistency: Do class/method/parameter names match?
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
> be as complete as the Scala doc.
> * API breaking changes: These should be very rare but are occasionally either 
> necessary (intentional) or accidental.  These must be recorded and added in 
> the Migration Guide for this release.
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python, to be added in the next release cycle.  
> Please use a *separate* JIRA (linked below as "requires") for this list of 
> to-do items.
> UPDATE: This only needs to cover spark.ml since spark.mllib is going into 
> maintenance mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15264) Spark 2.0 CSV Reader: Error on Blank Column Names

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15264:


Assignee: Apache Spark

> Spark 2.0 CSV Reader: Error on Blank Column Names
> -
>
> Key: SPARK-15264
> URL: https://issues.apache.org/jira/browse/SPARK-15264
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Bill Chambers
>Assignee: Apache Spark
>
> When you read in a csv file that starts with blank column names the read 
> fails when you specify that you want a header.
> Pull request coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15264) Spark 2.0 CSV Reader: Error on Blank Column Names

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279367#comment-15279367
 ] 

Apache Spark commented on SPARK-15264:
--

User 'anabranch' has created a pull request for this issue:
https://github.com/apache/spark/pull/13041

> Spark 2.0 CSV Reader: Error on Blank Column Names
> -
>
> Key: SPARK-15264
> URL: https://issues.apache.org/jira/browse/SPARK-15264
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Bill Chambers
>
> When you read in a csv file that starts with blank column names the read 
> fails when you specify that you want a header.
> Pull request coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15264) Spark 2.0 CSV Reader: Error on Blank Column Names

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15264:


Assignee: (was: Apache Spark)

> Spark 2.0 CSV Reader: Error on Blank Column Names
> -
>
> Key: SPARK-15264
> URL: https://issues.apache.org/jira/browse/SPARK-15264
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Bill Chambers
>
> When you read in a csv file that starts with blank column names the read 
> fails when you specify that you want a header.
> Pull request coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15264) Spark 2.0 CSV Reader: Error on Blank Column Names

2016-05-10 Thread Bill Chambers (JIRA)
Bill Chambers created SPARK-15264:
-

 Summary: Spark 2.0 CSV Reader: Error on Blank Column Names
 Key: SPARK-15264
 URL: https://issues.apache.org/jira/browse/SPARK-15264
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Bill Chambers


When you read in a csv file that starts with blank column names the read fails 
when you specify that you want a header.

Pull request coming shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15263) Make shuffle service dir cleanup faster by using `rm -rf`

2016-05-10 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-15263:
---

 Summary: Make shuffle service dir cleanup faster by using `rm -rf`
 Key: SPARK-15263
 URL: https://issues.apache.org/jira/browse/SPARK-15263
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 1.6.1
Reporter: Tejas Patil
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit

2016-05-10 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279318#comment-15279318
 ] 

Michael Gummelt commented on SPARK-10643:
-

+1 to fix this.  

> Support HDFS application download in client mode spark submit
> -
>
> Key: SPARK-10643
> URL: https://issues.apache.org/jira/browse/SPARK-10643
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Submit
>Reporter: Alan Braithwaite
>Priority: Minor
>
> When using mesos with docker and marathon, it would be nice to be able to 
> make spark-submit deployable on marathon and have that download a jar from 
> HDFS instead of having to package the jar with the docker.
> {code}
> $ docker run -it docker.example.com/spark:latest 
> /usr/local/spark/bin/spark-submit  --class 
> com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar 
> Warning: Skip remote jar hdfs://hdfs/tmp/application.jar.
> java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> Although I'm aware that we can run in cluster mode with mesos, we've already 
> built some nice tools surrounding marathon for logging and monitoring.
> Code in question:
> https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15250) Remove deprecated json API in DataFrameReader

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15250:


Assignee: (was: Apache Spark)

> Remove deprecated json API in DataFrameReader
> -
>
> Key: SPARK-15250
> URL: https://issues.apache.org/jira/browse/SPARK-15250
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> I see {{json() }}API in {{DataFrameReader}} should be removed as below:
> {code}
>   // TODO: Remove this one in Spark 2.0.
>   def json(path: String): DataFrame = format("json").load(path)
> {code}
> It seems this is because another {{json()}} API below covers the above
> {code}
>  def json(paths: String*): DataFrame = format("json").load(paths : _*)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15250) Remove deprecated json API in DataFrameReader

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279311#comment-15279311
 ] 

Apache Spark commented on SPARK-15250:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/13040

> Remove deprecated json API in DataFrameReader
> -
>
> Key: SPARK-15250
> URL: https://issues.apache.org/jira/browse/SPARK-15250
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> I see {{json() }}API in {{DataFrameReader}} should be removed as below:
> {code}
>   // TODO: Remove this one in Spark 2.0.
>   def json(path: String): DataFrame = format("json").load(path)
> {code}
> It seems this is because another {{json()}} API below covers the above
> {code}
>  def json(paths: String*): DataFrame = format("json").load(paths : _*)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15250) Remove deprecated json API in DataFrameReader

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15250:


Assignee: Apache Spark

> Remove deprecated json API in DataFrameReader
> -
>
> Key: SPARK-15250
> URL: https://issues.apache.org/jira/browse/SPARK-15250
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> I see {{json() }}API in {{DataFrameReader}} should be removed as below:
> {code}
>   // TODO: Remove this one in Spark 2.0.
>   def json(path: String): DataFrame = format("json").load(path)
> {code}
> It seems this is because another {{json()}} API below covers the above
> {code}
>  def json(paths: String*): DataFrame = format("json").load(paths : _*)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15262) race condition in killing an executor and reregistering an executor

2016-05-10 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-15262:


 Summary: race condition in killing an executor and reregistering 
an executor
 Key: SPARK-15262
 URL: https://issues.apache.org/jira/browse/SPARK-15262
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Shixiong Zhu


There is a race condition when killing an executor and reregistering an 
executor happen at the same time. Here is the execution steps to reproduce it.

1. master find a worker is dead
2. master tells driver to remove executor
3. driver remove executor
4. BlockManagerMasterEndpoint remove the block manager
5. executor finds it's not reigstered via heartbeat
6. executor send reregister block manager
7. register block manager
8. executor is killed by worker
9. CoarseGrainedSchedulerBackend ignores onDisconnected as this address is not 
in the executor list
10. BlockManagerMasterEndpoint.blockManagerInfo contains dead block managers

As BlockManagerMasterEndpoint.blockManagerInfo contains some dead block 
managers, when we unpersist a RDD, remove a broadcast, or clean a shuffle block 
via a RPC endpoint of a dead block manager, we will get ClosedChannelException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15262) race condition in killing an executor and reregistering an executor

2016-05-10 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-15262:
-
Affects Version/s: 1.6.1

> race condition in killing an executor and reregistering an executor
> ---
>
> Key: SPARK-15262
> URL: https://issues.apache.org/jira/browse/SPARK-15262
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Shixiong Zhu
>
> There is a race condition when killing an executor and reregistering an 
> executor happen at the same time. Here is the execution steps to reproduce it.
> 1. master find a worker is dead
> 2. master tells driver to remove executor
> 3. driver remove executor
> 4. BlockManagerMasterEndpoint remove the block manager
> 5. executor finds it's not reigstered via heartbeat
> 6. executor send reregister block manager
> 7. register block manager
> 8. executor is killed by worker
> 9. CoarseGrainedSchedulerBackend ignores onDisconnected as this address is 
> not in the executor list
> 10. BlockManagerMasterEndpoint.blockManagerInfo contains dead block managers
> As BlockManagerMasterEndpoint.blockManagerInfo contains some dead block 
> managers, when we unpersist a RDD, remove a broadcast, or clean a shuffle 
> block via a RPC endpoint of a dead block manager, we will get 
> ClosedChannelException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15243) Binarizer.explainParam(u"...") raises ValueError

2016-05-10 Thread Kazuki Yokoishi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279299#comment-15279299
 ] 

Kazuki Yokoishi commented on SPARK-15243:
-

Thank you for handling this issue.
But same problems caused by isinstance(obj, str) seem to still remain in 
dataframe.py and types.py.

$ grep -r "isinstance(.*, str)" python/pyspark/
python/pyspark/ml/param/__init__.py:if isinstance(paramName, str):
python/pyspark/ml/param/__init__.py:elif isinstance(param, str):
python/pyspark/sql/dataframe.py:if not isinstance(col, str):
python/pyspark/sql/dataframe.py:if not isinstance(col, str):
python/pyspark/sql/dataframe.py:if not isinstance(col1, str):
python/pyspark/sql/dataframe.py:if not isinstance(col2, str):
python/pyspark/sql/dataframe.py:if not isinstance(col1, str):
python/pyspark/sql/dataframe.py:if not isinstance(col2, str):
python/pyspark/sql/dataframe.py:if not isinstance(col1, str):
python/pyspark/sql/dataframe.py:if not isinstance(col2, str):
python/pyspark/sql/types.py:if not isinstance(name, str):
python/pyspark/sql/types.py:if isinstance(field, str) and data_type 
is None:
python/pyspark/sql/types.py:if isinstance(data_type, str):
python/pyspark/sql/types.py:if isinstance(key, str):

> Binarizer.explainParam(u"...") raises ValueError
> 
>
> Key: SPARK-15243
> URL: https://issues.apache.org/jira/browse/SPARK-15243
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
> Environment: CentOS 7, Spark 1.6.0
>Reporter: Kazuki Yokoishi
>Priority: Minor
>
> When unicode is passed to Binarizer.explainParam(), ValueError occurs.
> To reproduce:
> {noformat}
> >>> binarizer = Binarizer(threshold=1.0, inputCol="values", 
> >>> outputCol="features")
> >>> binarizer.explainParam("threshold") # str can be passed
> 'threshold: threshold in binary classification prediction, in range [0, 1] 
> (default: 0.0, current: 1.0)'
> >>> binarizer.explainParam(u"threshold") # unicode cannot be passed
> ---
> ValueErrorTraceback (most recent call last)
>  in ()
> > 1 binarizer.explainParam(u"threshold")
> /usr/spark/current/python/pyspark/ml/param/__init__.py in explainParam(self, 
> param)
>  96 default value and user-supplied value in a string.
>  97 """
> ---> 98 param = self._resolveParam(param)
>  99 values = []
> 100 if self.isDefined(param):
> /usr/spark/current/python/pyspark/ml/param/__init__.py in _resolveParam(self, 
> param)
> 231 return self.getParam(param)
> 232 else:
> --> 233 raise ValueError("Cannot resolve %r as a param." % param)
> 234 
> 235 @staticmethod
> ValueError: Cannot resolve u'threshold' as a param.
> {noformat}
> Same erros occur in other methods.
> * Binarizer.hasDefault()
> * Binarizer.getOrDefault()
> * Binarizer.isSet()
> These errors are caused by checks *isinstance(obj, str)* in 
> pyspark.ml.param.Params._resolveParam().
> basestring should be used instead of str in isinstance() for backward 
> compatibility as below.
> {noformat}
> if sys.version >= '3':
>  basestring = str
> if isinstance(obj, basestring):
> # TODO
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15248) Partition added with ALTER TABLE to a hive partitioned table is not read while querying

2016-05-10 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-15248:
--
Description: 
Table partitions can be added with locations different from default warehouse 
location of a hive table. 

{{CREATE TABLE parquetTable (a int) PARTITIONED BY (b int) STORED AS parquet}} 
{{ALTER TABLE parquetTable ADD PARTITION (b=1) LOCATION '/path/1'}}

Querying such a table throws error as the MetastoreFileCatalog does not list 
the added partition directory, it only lists the default base location.

asd
asd

{code}
[info] - SPARK-15248: explicitly added partitions should be readable *** FAILED 
*** (1 second, 8 milliseconds)
[info]   java.util.NoSuchElementException: key not found: 
file:/Users/tdas/Projects/Spark/spark2/target/tmp/spark-b39ad224-c5d1-4966-8981-fb45a2066d61/non-default-partition
[info]   at scala.collection.MapLike$class.default(MapLike.scala:228)
[info]   at scala.collection.AbstractMap.default(Map.scala:59)
[info]   at scala.collection.MapLike$class.apply(MapLike.scala:141)
[info]   at scala.collection.AbstractMap.apply(Map.scala:59)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog$$anonfun$listFiles$1.apply(PartitioningAwareFileCatalog.scala:59)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog$$anonfun$listFiles$1.apply(PartitioningAwareFileCatalog.scala:55)
[info]   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[info]   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
[info]   at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[info]   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog.listFiles(PartitioningAwareFileCatalog.scala:55)
[info]   at 
org.apache.spark.sql.execution.datasources.FileSourceStrategy$.apply(FileSourceStrategy.scala:93)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
[info]   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:55)
[info]   at 
org.apache.spark.sql.execution.SparkStrategies$SpecialLimits$.apply(SparkStrategies.scala:55)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
[info]   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:77)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:82)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:82)
[info]   at 
org.apache.spark.sql.QueryTest.assertEmptyMissingInput(QueryTest.scala:330)
[info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:146)
[info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:159)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7$$anonfun$apply$mcV$sp$25.apply(parquetSuites.scala:554)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7$$anonfun$apply$mcV$sp$25.apply(parquetSuites.scala:535)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:125)
[info]   at 
org.apache.spark.sql.hive.ParquetPartitioningTest.withTempDir(parquetSuites.scala:726)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7.apply$mcV$sp(parquetSuites.scala:535)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:166)
[info]   at 
org.apache.spark.sql.hive.ParquetPartitioningTest.withTable(parquetSuites.scala:726)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12.apply$mcV$sp(parquetSuites.scala:534)
[info]   at 

[jira] [Updated] (SPARK-15248) Partition added with ALTER TABLE to a hive partitioned table is not read while querying

2016-05-10 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-15248:
--
Description: 
Table partitions can be added with locations different from default warehouse 
location of a hive table. 

{{CREATE TABLE parquetTable (a int) PARTITIONED BY (b int) STORED AS parquet}} 
{{ALTER TABLE parquetTable ADD PARTITION (b=1) LOCATION '/path/1'}}

Querying such a table throws error as the MetastoreFileCatalog does not list 
the added partition directory, it only lists the default base location.

{{[info] - SPARK-15248: explicitly added partitions should be readable *** 
FAILED *** (1 second, 8 milliseconds)
[info]   java.util.NoSuchElementException: key not found: 
file:/Users/tdas/Projects/Spark/spark2/target/tmp/spark-b39ad224-c5d1-4966-8981-fb45a2066d61/non-default-partition
[info]   at scala.collection.MapLike$class.default(MapLike.scala:228)
[info]   at scala.collection.AbstractMap.default(Map.scala:59)
[info]   at scala.collection.MapLike$class.apply(MapLike.scala:141)
[info]   at scala.collection.AbstractMap.apply(Map.scala:59)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog$$anonfun$listFiles$1.apply(PartitioningAwareFileCatalog.scala:59)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog$$anonfun$listFiles$1.apply(PartitioningAwareFileCatalog.scala:55)
[info]   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[info]   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
[info]   at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[info]   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog.listFiles(PartitioningAwareFileCatalog.scala:55)
[info]   at 
org.apache.spark.sql.execution.datasources.FileSourceStrategy$.apply(FileSourceStrategy.scala:93)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
[info]   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:55)
[info]   at 
org.apache.spark.sql.execution.SparkStrategies$SpecialLimits$.apply(SparkStrategies.scala:55)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
[info]   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:77)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:82)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:82)
[info]   at 
org.apache.spark.sql.QueryTest.assertEmptyMissingInput(QueryTest.scala:330)
[info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:146)
[info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:159)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7$$anonfun$apply$mcV$sp$25.apply(parquetSuites.scala:554)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7$$anonfun$apply$mcV$sp$25.apply(parquetSuites.scala:535)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:125)
[info]   at 
org.apache.spark.sql.hive.ParquetPartitioningTest.withTempDir(parquetSuites.scala:726)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7.apply$mcV$sp(parquetSuites.scala:535)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:166)
[info]   at 
org.apache.spark.sql.hive.ParquetPartitioningTest.withTable(parquetSuites.scala:726)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12.apply$mcV$sp(parquetSuites.scala:534)
[info]   at 

[jira] [Updated] (SPARK-15248) Partition added with ALTER TABLE to a hive partitioned table is not read while querying

2016-05-10 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-15248:
--
Description: 
Table partitions can be added with locations different from default warehouse 
location of a hive table. 

{{CREATE TABLE parquetTable (a int) PARTITIONED BY (b int) STORED AS parquet}} 
{{ALTER TABLE parquetTable ADD PARTITION (b=1) LOCATION '/path/1'}}

Querying such a table throws error as the MetastoreFileCatalog does not list 
the added partition directory, it only lists the default base location.

{{
[info] - SPARK-15248: explicitly added partitions should be readable *** FAILED 
*** (1 second, 8 milliseconds)
[info]   java.util.NoSuchElementException: key not found: 
file:/Users/tdas/Projects/Spark/spark2/target/tmp/spark-b39ad224-c5d1-4966-8981-fb45a2066d61/non-default-partition
[info]   at scala.collection.MapLike$class.default(MapLike.scala:228)
[info]   at scala.collection.AbstractMap.default(Map.scala:59)
[info]   at scala.collection.MapLike$class.apply(MapLike.scala:141)
[info]   at scala.collection.AbstractMap.apply(Map.scala:59)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog$$anonfun$listFiles$1.apply(PartitioningAwareFileCatalog.scala:59)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog$$anonfun$listFiles$1.apply(PartitioningAwareFileCatalog.scala:55)
[info]   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[info]   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
[info]   at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[info]   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[info]   at 
org.apache.spark.sql.execution.datasources.PartitioningAwareFileCatalog.listFiles(PartitioningAwareFileCatalog.scala:55)
[info]   at 
org.apache.spark.sql.execution.datasources.FileSourceStrategy$.apply(FileSourceStrategy.scala:93)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
[info]   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:55)
[info]   at 
org.apache.spark.sql.execution.SparkStrategies$SpecialLimits$.apply(SparkStrategies.scala:55)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:59)
[info]   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
[info]   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
[info]   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:60)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:77)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:75)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:82)
[info]   at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:82)
[info]   at 
org.apache.spark.sql.QueryTest.assertEmptyMissingInput(QueryTest.scala:330)
[info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:146)
[info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:159)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7$$anonfun$apply$mcV$sp$25.apply(parquetSuites.scala:554)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7$$anonfun$apply$mcV$sp$25.apply(parquetSuites.scala:535)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils$class.withTempDir(SQLTestUtils.scala:125)
[info]   at 
org.apache.spark.sql.hive.ParquetPartitioningTest.withTempDir(parquetSuites.scala:726)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12$$anonfun$apply$mcV$sp$7.apply$mcV$sp(parquetSuites.scala:535)
[info]   at 
org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:166)
[info]   at 
org.apache.spark.sql.hive.ParquetPartitioningTest.withTable(parquetSuites.scala:726)
[info]   at 
org.apache.spark.sql.hive.ParquetMetastoreSuite$$anonfun$12.apply$mcV$sp(parquetSuites.scala:534)
[info]   at 

[jira] [Assigned] (SPARK-15260) UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15260:


Assignee: Apache Spark  (was: Andrew Or)

> UnifiedMemoryManager could be in bad state if any exception happen while 
> evicting blocks
> 
>
> Key: SPARK-15260
> URL: https://issues.apache.org/jira/browse/SPARK-15260
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Davies Liu
>Assignee: Apache Spark
>
> {code}
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 62 in stage 19.0 failed 4 times, most
> recent failure: Lost task 62.3 in stage 19.0 (TID 2841, 
> ip-10-109-240-229.ec2.internal): java.io.IOException:
> java.lang.AssertionError: assertion failed at 
> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1223) at
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock( 
> TorrentBroadcast.scala:165) at
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute( 
> TorrentBroadcast.scala:64) at
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast. 
> scala:64) at
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast. 
> scala:88) at
> org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala: 71) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala: 46) at
> org.apache.spark.scheduler.Task.run(Task.scala:96) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:222) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. 
> java:1142) at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. 
> java:617) at
> java.lang.Thread.run(Thread.java:745) Caused by: java.lang.AssertionError: 
> assertion failed at
> scala.Predef$.assert(Predef.scala:165) at 
> org.apache.spark.memory.UnifiedMemoryManager.acquireStorageMemory(
> UnifiedMemoryManager.scala:140) at 
> org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:387) at
> org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:346) at
> org.apache.spark.storage.MemoryStore.putBytes(MemoryStore.scala:99) at
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:803) at
> org.apache.spark.storage.BlockManager.putBytes(BlockManager.scala:690) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$or
> 1c5ab38dcb7d9b112f54b116debbe7fcast$$anonfun$$getRemote$1$1.apply( 
> TorrentBroadcast.scala:130) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$or
> 1c5ab38dcb7d9b112f54b116debbe7fcast$$anonfun$$getRemote$1$1.apply( 
> TorrentBroadcast.scala:127) at
> scala.Option.map(Option.scala:145) at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.org$apache$spark$broadcast$
> TorrentBroadcast$$anonfun$$getRemote$1(TorrentBroadcast.scala:127) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply( 
> TorrentBroadcast.scala:137) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply( 
> TorrentBroadcast.scala:137) at
> scala.Option.orElse(Option.scala:257) at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast. 
> scala:137) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala: 120) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala: 120) at
> scala.collection.immutable.List.foreach(List.scala:318) at
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$
> TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$ 
> 1.apply(TorrentBroadcast.scala:175) at
> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1220) ... 12 more 
> Driver stacktrace: at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$
> DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( 
> DAGScheduler.scala:1419) at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( 
> DAGScheduler.scala:1418) at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray. 
> scala:59) at
> 

[jira] [Comment Edited] (SPARK-14148) Kmeans Sum of squares - Within cluster, between clusters and total

2016-05-10 Thread Narine Kokhlikyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211393#comment-15211393
 ] 

Narine Kokhlikyan edited comment on SPARK-14148 at 5/10/16 11:57 PM:
-

I can work on this. Will start after Kmeans optimizations go in.


was (Author: narine):
I can work on. Will start after Kmeans optimizations go in.

> Kmeans Sum of squares - Within cluster, between clusters and total
> --
>
> Key: SPARK-14148
> URL: https://issues.apache.org/jira/browse/SPARK-14148
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Narine Kokhlikyan
>Priority: Minor
>
> As discussed in: 
> https://github.com/apache/spark/pull/10806#issuecomment-200324279
> creating this jira for adding to KMeans the following features: 
> Within cluster sum of square, between clusters sum of square and total sum of 
> square. 
> cc [~mengxr]
> Link to R’s Documentation
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html
> Link to sklearn’s documentation
> http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15260) UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279263#comment-15279263
 ] 

Apache Spark commented on SPARK-15260:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/13039

> UnifiedMemoryManager could be in bad state if any exception happen while 
> evicting blocks
> 
>
> Key: SPARK-15260
> URL: https://issues.apache.org/jira/browse/SPARK-15260
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Davies Liu
>Assignee: Andrew Or
>
> {code}
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 62 in stage 19.0 failed 4 times, most
> recent failure: Lost task 62.3 in stage 19.0 (TID 2841, 
> ip-10-109-240-229.ec2.internal): java.io.IOException:
> java.lang.AssertionError: assertion failed at 
> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1223) at
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock( 
> TorrentBroadcast.scala:165) at
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute( 
> TorrentBroadcast.scala:64) at
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast. 
> scala:64) at
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast. 
> scala:88) at
> org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala: 71) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala: 46) at
> org.apache.spark.scheduler.Task.run(Task.scala:96) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:222) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. 
> java:1142) at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. 
> java:617) at
> java.lang.Thread.run(Thread.java:745) Caused by: java.lang.AssertionError: 
> assertion failed at
> scala.Predef$.assert(Predef.scala:165) at 
> org.apache.spark.memory.UnifiedMemoryManager.acquireStorageMemory(
> UnifiedMemoryManager.scala:140) at 
> org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:387) at
> org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:346) at
> org.apache.spark.storage.MemoryStore.putBytes(MemoryStore.scala:99) at
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:803) at
> org.apache.spark.storage.BlockManager.putBytes(BlockManager.scala:690) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$or
> 1c5ab38dcb7d9b112f54b116debbe7fcast$$anonfun$$getRemote$1$1.apply( 
> TorrentBroadcast.scala:130) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$or
> 1c5ab38dcb7d9b112f54b116debbe7fcast$$anonfun$$getRemote$1$1.apply( 
> TorrentBroadcast.scala:127) at
> scala.Option.map(Option.scala:145) at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.org$apache$spark$broadcast$
> TorrentBroadcast$$anonfun$$getRemote$1(TorrentBroadcast.scala:127) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply( 
> TorrentBroadcast.scala:137) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply( 
> TorrentBroadcast.scala:137) at
> scala.Option.orElse(Option.scala:257) at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast. 
> scala:137) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala: 120) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala: 120) at
> scala.collection.immutable.List.foreach(List.scala:318) at
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$
> TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$ 
> 1.apply(TorrentBroadcast.scala:175) at
> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1220) ... 12 more 
> Driver stacktrace: at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$
> DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( 
> DAGScheduler.scala:1419) at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( 
> DAGScheduler.scala:1418) at
> 

[jira] [Assigned] (SPARK-15260) UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15260:


Assignee: Andrew Or  (was: Apache Spark)

> UnifiedMemoryManager could be in bad state if any exception happen while 
> evicting blocks
> 
>
> Key: SPARK-15260
> URL: https://issues.apache.org/jira/browse/SPARK-15260
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Davies Liu
>Assignee: Andrew Or
>
> {code}
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 62 in stage 19.0 failed 4 times, most
> recent failure: Lost task 62.3 in stage 19.0 (TID 2841, 
> ip-10-109-240-229.ec2.internal): java.io.IOException:
> java.lang.AssertionError: assertion failed at 
> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1223) at
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock( 
> TorrentBroadcast.scala:165) at
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute( 
> TorrentBroadcast.scala:64) at
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast. 
> scala:64) at
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast. 
> scala:88) at
> org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala: 71) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala: 46) at
> org.apache.spark.scheduler.Task.run(Task.scala:96) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:222) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. 
> java:1142) at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. 
> java:617) at
> java.lang.Thread.run(Thread.java:745) Caused by: java.lang.AssertionError: 
> assertion failed at
> scala.Predef$.assert(Predef.scala:165) at 
> org.apache.spark.memory.UnifiedMemoryManager.acquireStorageMemory(
> UnifiedMemoryManager.scala:140) at 
> org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:387) at
> org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:346) at
> org.apache.spark.storage.MemoryStore.putBytes(MemoryStore.scala:99) at
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:803) at
> org.apache.spark.storage.BlockManager.putBytes(BlockManager.scala:690) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$or
> 1c5ab38dcb7d9b112f54b116debbe7fcast$$anonfun$$getRemote$1$1.apply( 
> TorrentBroadcast.scala:130) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$or
> 1c5ab38dcb7d9b112f54b116debbe7fcast$$anonfun$$getRemote$1$1.apply( 
> TorrentBroadcast.scala:127) at
> scala.Option.map(Option.scala:145) at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.org$apache$spark$broadcast$
> TorrentBroadcast$$anonfun$$getRemote$1(TorrentBroadcast.scala:127) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply( 
> TorrentBroadcast.scala:137) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply( 
> TorrentBroadcast.scala:137) at
> scala.Option.orElse(Option.scala:257) at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast. 
> scala:137) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala: 120) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
> broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala: 120) at
> scala.collection.immutable.List.foreach(List.scala:318) at
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$
> TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120) at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$ 
> 1.apply(TorrentBroadcast.scala:175) at
> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1220) ... 12 more 
> Driver stacktrace: at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$
> DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( 
> DAGScheduler.scala:1419) at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( 
> DAGScheduler.scala:1418) at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray. 
> scala:59) at
> 

[jira] [Assigned] (SPARK-15261) Remove experimental tag from DataFrameReader and DataFrameWriter

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15261:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove experimental tag from DataFrameReader and DataFrameWriter
> 
>
> Key: SPARK-15261
> URL: https://issues.apache.org/jira/browse/SPARK-15261
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>
> We should tag streaming related stuff with experimental, and remove the rest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15261) Remove experimental tag from DataFrameReader and DataFrameWriter

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15261:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove experimental tag from DataFrameReader and DataFrameWriter
> 
>
> Key: SPARK-15261
> URL: https://issues.apache.org/jira/browse/SPARK-15261
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> We should tag streaming related stuff with experimental, and remove the rest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15261) Remove experimental tag from DataFrameReader and DataFrameWriter

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279259#comment-15279259
 ] 

Apache Spark commented on SPARK-15261:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/13038

> Remove experimental tag from DataFrameReader and DataFrameWriter
> 
>
> Key: SPARK-15261
> URL: https://issues.apache.org/jira/browse/SPARK-15261
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> We should tag streaming related stuff with experimental, and remove the rest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15261) Remove experimental tag from DataFrameReader and DataFrameWriter

2016-05-10 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-15261:
---

 Summary: Remove experimental tag from DataFrameReader and 
DataFrameWriter
 Key: SPARK-15261
 URL: https://issues.apache.org/jira/browse/SPARK-15261
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


We should tag streaming related stuff with experimental, and remove the rest.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15260) UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks

2016-05-10 Thread Davies Liu (JIRA)
Davies Liu created SPARK-15260:
--

 Summary: UnifiedMemoryManager could be in bad state if any 
exception happen while evicting blocks
 Key: SPARK-15260
 URL: https://issues.apache.org/jira/browse/SPARK-15260
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.1, 1.6.0, 2.0.0
Reporter: Davies Liu
Assignee: Andrew Or


{code}

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
62 in stage 19.0 failed 4 times, most
recent failure: Lost task 62.3 in stage 19.0 (TID 2841, 
ip-10-109-240-229.ec2.internal): java.io.IOException:
java.lang.AssertionError: assertion failed at 
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1223) at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock( 
TorrentBroadcast.scala:165) at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute( 
TorrentBroadcast.scala:64) at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast. scala:64) 
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast. 
scala:88) at
org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala: 71) at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala: 46) at
org.apache.spark.scheduler.Task.run(Task.scala:96) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:222) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. 
java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. 
java:617) at
java.lang.Thread.run(Thread.java:745) Caused by: java.lang.AssertionError: 
assertion failed at
scala.Predef$.assert(Predef.scala:165) at 
org.apache.spark.memory.UnifiedMemoryManager.acquireStorageMemory(
UnifiedMemoryManager.scala:140) at 
org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:387) at
org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:346) at
org.apache.spark.storage.MemoryStore.putBytes(MemoryStore.scala:99) at
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:803) at
org.apache.spark.storage.BlockManager.putBytes(BlockManager.scala:690) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$or
1c5ab38dcb7d9b112f54b116debbe7fcast$$anonfun$$getRemote$1$1.apply( 
TorrentBroadcast.scala:130) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$or
1c5ab38dcb7d9b112f54b116debbe7fcast$$anonfun$$getRemote$1$1.apply( 
TorrentBroadcast.scala:127) at
scala.Option.map(Option.scala:145) at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
broadcast$TorrentBroadcast$$readBlocks$1.org$apache$spark$broadcast$
TorrentBroadcast$$anonfun$$getRemote$1(TorrentBroadcast.scala:127) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply( 
TorrentBroadcast.scala:137) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply( 
TorrentBroadcast.scala:137) at
scala.Option.orElse(Option.scala:257) at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast. 
scala:137) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala: 120) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$
broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala: 120) at
scala.collection.immutable.List.foreach(List.scala:318) at
org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$
TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120) at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$ 
1.apply(TorrentBroadcast.scala:175) at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1220) ... 12 more 
Driver stacktrace: at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$
DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( 
DAGScheduler.scala:1419) at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( 
DAGScheduler.scala:1418) at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray. scala:59) 
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala: 1418) at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1. 
apply(DAGScheduler.scala:799) at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1. 
apply(DAGScheduler.scala:799) at
scala.Option.foreach(Option.scala:236) at 

[jira] [Resolved] (SPARK-14837) Add support in file stream source for reading new files added to subdirs

2016-05-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-14837.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12616
[https://github.com/apache/spark/pull/12616]

> Add support in file stream source for reading new files added to subdirs
> 
>
> Key: SPARK-14837
> URL: https://issues.apache.org/jira/browse/SPARK-14837
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15114) Column name generated by typed aggregate is super verbose

2016-05-10 Thread Dilip Biswal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279229#comment-15279229
 ] 

Dilip Biswal commented on SPARK-15114:
--

Going to submit a PR for this tonight.

> Column name generated by typed aggregate is super verbose
> -
>
> Key: SPARK-15114
> URL: https://issues.apache.org/jira/browse/SPARK-15114
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Priority: Critical
>
> {code}
> case class Person(name: String, email: String, age: Long)
> val ds = spark.read.json("/tmp/person.json").as[Person]
> import org.apache.spark.sql.expressions.scala.typed._
> ds.groupByKey(_ => 0).agg(sum(_.age))
> // org.apache.spark.sql.Dataset[(Int, Double)] = [value: int, 
> typedsumdouble(unresolveddeserializer(newInstance(class Person), age#0L, 
> email#1, name#2), upcast(value)): double]
> ds.groupByKey(_ => 0).agg(sum(_.age)).explain
> == Physical Plan ==
> WholeStageCodegen
> :  +- TungstenAggregate(key=[value#84], 
> functions=[(TypedSumDouble($line15.$read$$iw$$iw$Person),mode=Final,isDistinct=false)],
>  output=[value#84,typedsumdouble(unresolveddeserializer(newInstance(class 
> $line15.$read$$iw$$iw$Person), age#0L, email#1, name#2), upcast(value))#91])
> : +- INPUT
> +- Exchange hashpartitioning(value#84, 200), None
>+- WholeStageCodegen
>   :  +- TungstenAggregate(key=[value#84], 
> functions=[(TypedSumDouble($line15.$read$$iw$$iw$Person),mode=Partial,isDistinct=false)],
>  output=[value#84,value#97])
>   : +- INPUT
>   +- AppendColumns , newInstance(class 
> $line15.$read$$iw$$iw$Person), [input[0, int] AS value#84]
>  +- WholeStageCodegen
> :  +- Scan HadoopFiles[age#0L,email#1,name#2] Format: JSON, 
> PushedFilters: [], ReadSchema: struct
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1301) Add UI elements to collapse "Aggregated Metrics by Executor" pane on stage page

2016-05-10 Thread Alex Bozarth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279188#comment-15279188
 ] 

Alex Bozarth commented on SPARK-1301:
-

Submitted a small fix similar to Ryan's above but in line with how it's done on 
the Jobs and Stages page

> Add UI elements to collapse "Aggregated Metrics by Executor" pane on stage 
> page
> ---
>
> Key: SPARK-1301
> URL: https://issues.apache.org/jira/browse/SPARK-1301
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Matei Zaharia
>Priority: Minor
>  Labels: Starter
>
> This table is useful but it takes up a lot of space on larger clusters, 
> hiding the more commonly accessed "stage" page. We could also move the table 
> below if collapsing it is difficult.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1301) Add UI elements to collapse "Aggregated Metrics by Executor" pane on stage page

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279177#comment-15279177
 ] 

Apache Spark commented on SPARK-1301:
-

User 'ajbozarth' has created a pull request for this issue:
https://github.com/apache/spark/pull/13037

> Add UI elements to collapse "Aggregated Metrics by Executor" pane on stage 
> page
> ---
>
> Key: SPARK-1301
> URL: https://issues.apache.org/jira/browse/SPARK-1301
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Matei Zaharia
>Priority: Minor
>  Labels: Starter
>
> This table is useful but it takes up a lot of space on larger clusters, 
> hiding the more commonly accessed "stage" page. We could also move the table 
> below if collapsing it is difficult.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15243) Binarizer.explainParam(u"...") raises ValueError

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15243:


Assignee: Apache Spark

> Binarizer.explainParam(u"...") raises ValueError
> 
>
> Key: SPARK-15243
> URL: https://issues.apache.org/jira/browse/SPARK-15243
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
> Environment: CentOS 7, Spark 1.6.0
>Reporter: Kazuki Yokoishi
>Assignee: Apache Spark
>Priority: Minor
>
> When unicode is passed to Binarizer.explainParam(), ValueError occurs.
> To reproduce:
> {noformat}
> >>> binarizer = Binarizer(threshold=1.0, inputCol="values", 
> >>> outputCol="features")
> >>> binarizer.explainParam("threshold") # str can be passed
> 'threshold: threshold in binary classification prediction, in range [0, 1] 
> (default: 0.0, current: 1.0)'
> >>> binarizer.explainParam(u"threshold") # unicode cannot be passed
> ---
> ValueErrorTraceback (most recent call last)
>  in ()
> > 1 binarizer.explainParam(u"threshold")
> /usr/spark/current/python/pyspark/ml/param/__init__.py in explainParam(self, 
> param)
>  96 default value and user-supplied value in a string.
>  97 """
> ---> 98 param = self._resolveParam(param)
>  99 values = []
> 100 if self.isDefined(param):
> /usr/spark/current/python/pyspark/ml/param/__init__.py in _resolveParam(self, 
> param)
> 231 return self.getParam(param)
> 232 else:
> --> 233 raise ValueError("Cannot resolve %r as a param." % param)
> 234 
> 235 @staticmethod
> ValueError: Cannot resolve u'threshold' as a param.
> {noformat}
> Same erros occur in other methods.
> * Binarizer.hasDefault()
> * Binarizer.getOrDefault()
> * Binarizer.isSet()
> These errors are caused by checks *isinstance(obj, str)* in 
> pyspark.ml.param.Params._resolveParam().
> basestring should be used instead of str in isinstance() for backward 
> compatibility as below.
> {noformat}
> if sys.version >= '3':
>  basestring = str
> if isinstance(obj, basestring):
> # TODO
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15243) Binarizer.explainParam(u"...") raises ValueError

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279161#comment-15279161
 ] 

Apache Spark commented on SPARK-15243:
--

User 'sethah' has created a pull request for this issue:
https://github.com/apache/spark/pull/13036

> Binarizer.explainParam(u"...") raises ValueError
> 
>
> Key: SPARK-15243
> URL: https://issues.apache.org/jira/browse/SPARK-15243
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
> Environment: CentOS 7, Spark 1.6.0
>Reporter: Kazuki Yokoishi
>Priority: Minor
>
> When unicode is passed to Binarizer.explainParam(), ValueError occurs.
> To reproduce:
> {noformat}
> >>> binarizer = Binarizer(threshold=1.0, inputCol="values", 
> >>> outputCol="features")
> >>> binarizer.explainParam("threshold") # str can be passed
> 'threshold: threshold in binary classification prediction, in range [0, 1] 
> (default: 0.0, current: 1.0)'
> >>> binarizer.explainParam(u"threshold") # unicode cannot be passed
> ---
> ValueErrorTraceback (most recent call last)
>  in ()
> > 1 binarizer.explainParam(u"threshold")
> /usr/spark/current/python/pyspark/ml/param/__init__.py in explainParam(self, 
> param)
>  96 default value and user-supplied value in a string.
>  97 """
> ---> 98 param = self._resolveParam(param)
>  99 values = []
> 100 if self.isDefined(param):
> /usr/spark/current/python/pyspark/ml/param/__init__.py in _resolveParam(self, 
> param)
> 231 return self.getParam(param)
> 232 else:
> --> 233 raise ValueError("Cannot resolve %r as a param." % param)
> 234 
> 235 @staticmethod
> ValueError: Cannot resolve u'threshold' as a param.
> {noformat}
> Same erros occur in other methods.
> * Binarizer.hasDefault()
> * Binarizer.getOrDefault()
> * Binarizer.isSet()
> These errors are caused by checks *isinstance(obj, str)* in 
> pyspark.ml.param.Params._resolveParam().
> basestring should be used instead of str in isinstance() for backward 
> compatibility as below.
> {noformat}
> if sys.version >= '3':
>  basestring = str
> if isinstance(obj, basestring):
> # TODO
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15243) Binarizer.explainParam(u"...") raises ValueError

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15243:


Assignee: (was: Apache Spark)

> Binarizer.explainParam(u"...") raises ValueError
> 
>
> Key: SPARK-15243
> URL: https://issues.apache.org/jira/browse/SPARK-15243
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
> Environment: CentOS 7, Spark 1.6.0
>Reporter: Kazuki Yokoishi
>Priority: Minor
>
> When unicode is passed to Binarizer.explainParam(), ValueError occurs.
> To reproduce:
> {noformat}
> >>> binarizer = Binarizer(threshold=1.0, inputCol="values", 
> >>> outputCol="features")
> >>> binarizer.explainParam("threshold") # str can be passed
> 'threshold: threshold in binary classification prediction, in range [0, 1] 
> (default: 0.0, current: 1.0)'
> >>> binarizer.explainParam(u"threshold") # unicode cannot be passed
> ---
> ValueErrorTraceback (most recent call last)
>  in ()
> > 1 binarizer.explainParam(u"threshold")
> /usr/spark/current/python/pyspark/ml/param/__init__.py in explainParam(self, 
> param)
>  96 default value and user-supplied value in a string.
>  97 """
> ---> 98 param = self._resolveParam(param)
>  99 values = []
> 100 if self.isDefined(param):
> /usr/spark/current/python/pyspark/ml/param/__init__.py in _resolveParam(self, 
> param)
> 231 return self.getParam(param)
> 232 else:
> --> 233 raise ValueError("Cannot resolve %r as a param." % param)
> 234 
> 235 @staticmethod
> ValueError: Cannot resolve u'threshold' as a param.
> {noformat}
> Same erros occur in other methods.
> * Binarizer.hasDefault()
> * Binarizer.getOrDefault()
> * Binarizer.isSet()
> These errors are caused by checks *isinstance(obj, str)* in 
> pyspark.ml.param.Params._resolveParam().
> basestring should be used instead of str in isinstance() for backward 
> compatibility as below.
> {noformat}
> if sys.version >= '3':
>  basestring = str
> if isinstance(obj, basestring):
> # TODO
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15259) Sort time metric should not include spill and record insertion time

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15259:


Assignee: Apache Spark

> Sort time metric should not include spill and record insertion time
> ---
>
> Key: SPARK-15259
> URL: https://issues.apache.org/jira/browse/SPARK-15259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>Assignee: Apache Spark
>Priority: Minor
>
> After SPARK-14669 it seems the sort time metric includes both spill and 
> record insertion time. This makes it not very useful since the metric becomes 
> close to the total execution time of the node.
> We should track just the time spent for in-memory sort, as before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15259) Sort time metric should not include spill and record insertion time

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279131#comment-15279131
 ] 

Apache Spark commented on SPARK-15259:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/13035

> Sort time metric should not include spill and record insertion time
> ---
>
> Key: SPARK-15259
> URL: https://issues.apache.org/jira/browse/SPARK-15259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>Priority: Minor
>
> After SPARK-14669 it seems the sort time metric includes both spill and 
> record insertion time. This makes it not very useful since the metric becomes 
> close to the total execution time of the node.
> We should track just the time spent for in-memory sort, as before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15259) Sort time metric

2016-05-10 Thread Eric Liang (JIRA)
Eric Liang created SPARK-15259:
--

 Summary: Sort time metric
 Key: SPARK-15259
 URL: https://issues.apache.org/jira/browse/SPARK-15259
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Eric Liang
Priority: Minor


After SPARK-14669 it seems the sort time metric includes both spill and record 
insertion time. This makes it not very useful since the metric becomes close to 
the total execution time of the node.

We should track just the time spent for in-memory sort, as before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15259) Sort time metric should not include spill and record insertion time

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15259:


Assignee: (was: Apache Spark)

> Sort time metric should not include spill and record insertion time
> ---
>
> Key: SPARK-15259
> URL: https://issues.apache.org/jira/browse/SPARK-15259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>Priority: Minor
>
> After SPARK-14669 it seems the sort time metric includes both spill and 
> record insertion time. This makes it not very useful since the metric becomes 
> close to the total execution time of the node.
> We should track just the time spent for in-memory sort, as before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15259) Sort time metric should not include spill and record insertion time

2016-05-10 Thread Eric Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Liang updated SPARK-15259:
---
Summary: Sort time metric should not include spill and record insertion 
time  (was: Sort time metric)

> Sort time metric should not include spill and record insertion time
> ---
>
> Key: SPARK-15259
> URL: https://issues.apache.org/jira/browse/SPARK-15259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>Priority: Minor
>
> After SPARK-14669 it seems the sort time metric includes both spill and 
> record insertion time. This makes it not very useful since the metric becomes 
> close to the total execution time of the node.
> We should track just the time spent for in-memory sort, as before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14936) FlumePollingStreamSuite is slow

2016-05-10 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-14936.
--
   Resolution: Fixed
 Assignee: Xin Ren
Fix Version/s: 2.0.0

> FlumePollingStreamSuite is slow
> ---
>
> Key: SPARK-14936
> URL: https://issues.apache.org/jira/browse/SPARK-14936
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Reporter: Josh Rosen
>Assignee: Xin Ren
> Fix For: 2.0.0
>
>
> FlumePollingStreamSuite contains two tests which run for a minute each. This 
> seems excessively slow and we should speed it up if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15243) Binarizer.explainParam(u"...") raises ValueError

2016-05-10 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279091#comment-15279091
 ] 

Seth Hendrickson commented on SPARK-15243:
--

I'll submit a PR shortly.

> Binarizer.explainParam(u"...") raises ValueError
> 
>
> Key: SPARK-15243
> URL: https://issues.apache.org/jira/browse/SPARK-15243
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
> Environment: CentOS 7, Spark 1.6.0
>Reporter: Kazuki Yokoishi
>Priority: Minor
>
> When unicode is passed to Binarizer.explainParam(), ValueError occurs.
> To reproduce:
> {noformat}
> >>> binarizer = Binarizer(threshold=1.0, inputCol="values", 
> >>> outputCol="features")
> >>> binarizer.explainParam("threshold") # str can be passed
> 'threshold: threshold in binary classification prediction, in range [0, 1] 
> (default: 0.0, current: 1.0)'
> >>> binarizer.explainParam(u"threshold") # unicode cannot be passed
> ---
> ValueErrorTraceback (most recent call last)
>  in ()
> > 1 binarizer.explainParam(u"threshold")
> /usr/spark/current/python/pyspark/ml/param/__init__.py in explainParam(self, 
> param)
>  96 default value and user-supplied value in a string.
>  97 """
> ---> 98 param = self._resolveParam(param)
>  99 values = []
> 100 if self.isDefined(param):
> /usr/spark/current/python/pyspark/ml/param/__init__.py in _resolveParam(self, 
> param)
> 231 return self.getParam(param)
> 232 else:
> --> 233 raise ValueError("Cannot resolve %r as a param." % param)
> 234 
> 235 @staticmethod
> ValueError: Cannot resolve u'threshold' as a param.
> {noformat}
> Same erros occur in other methods.
> * Binarizer.hasDefault()
> * Binarizer.getOrDefault()
> * Binarizer.isSet()
> These errors are caused by checks *isinstance(obj, str)* in 
> pyspark.ml.param.Params._resolveParam().
> basestring should be used instead of str in isinstance() for backward 
> compatibility as below.
> {noformat}
> if sys.version >= '3':
>  basestring = str
> if isinstance(obj, basestring):
> # TODO
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15145) port binary classification evaluator to spark.ml

2016-05-10 Thread Miao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miao Wang updated SPARK-15145:
--
Description: As we discussed in #12922, binary classification evaluator 
should be ported from mllib to spark.ml after 2.0 release.  (was: spark.ml 
binary classification should include accuracy. This JIRA is related to 
SPARK-14900.)

> port binary classification evaluator to spark.ml
> 
>
> Key: SPARK-15145
> URL: https://issues.apache.org/jira/browse/SPARK-15145
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Miao Wang
>
> As we discussed in #12922, binary classification evaluator should be ported 
> from mllib to spark.ml after 2.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15145) port binary classification evaluator to spark.ml

2016-05-10 Thread Miao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miao Wang updated SPARK-15145:
--
Summary: port binary classification evaluator to spark.ml  (was: spark.ml 
binary classification should include accuracy)

> port binary classification evaluator to spark.ml
> 
>
> Key: SPARK-15145
> URL: https://issues.apache.org/jira/browse/SPARK-15145
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Miao Wang
>Priority: Minor
>
> spark.ml binary classification should include accuracy. This JIRA is related 
> to SPARK-14900.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15145) port binary classification evaluator to spark.ml

2016-05-10 Thread Miao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miao Wang updated SPARK-15145:
--
Priority: Major  (was: Minor)

> port binary classification evaluator to spark.ml
> 
>
> Key: SPARK-15145
> URL: https://issues.apache.org/jira/browse/SPARK-15145
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Miao Wang
>
> spark.ml binary classification should include accuracy. This JIRA is related 
> to SPARK-14900.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15256) Clarify the docstring for DataFrameReader.jdbc()

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15256:


Assignee: Apache Spark

> Clarify the docstring for DataFrameReader.jdbc()
> 
>
> Key: SPARK-15256
> URL: https://issues.apache.org/jira/browse/SPARK-15256
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 1.6.1
>Reporter: Nicholas Chammas
>Assignee: Apache Spark
>Priority: Minor
>
> The doc for the {{properties}} parameter [currently 
> reads|https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L437-L439]:
> {quote}
> :param properties: JDBC database connection arguments, a list of 
> arbitrary string
>tag/value. Normally at least a "user" and 
> "password" property
>should be included.
> {quote}
> This is incorrect, since {{properties}} is expected to be a dictionary.
> Some of the other parameters have cryptic descriptions. I'll try to clarify 
> those as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15256) Clarify the docstring for DataFrameReader.jdbc()

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279025#comment-15279025
 ] 

Apache Spark commented on SPARK-15256:
--

User 'nchammas' has created a pull request for this issue:
https://github.com/apache/spark/pull/13034

> Clarify the docstring for DataFrameReader.jdbc()
> 
>
> Key: SPARK-15256
> URL: https://issues.apache.org/jira/browse/SPARK-15256
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 1.6.1
>Reporter: Nicholas Chammas
>Priority: Minor
>
> The doc for the {{properties}} parameter [currently 
> reads|https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L437-L439]:
> {quote}
> :param properties: JDBC database connection arguments, a list of 
> arbitrary string
>tag/value. Normally at least a "user" and 
> "password" property
>should be included.
> {quote}
> This is incorrect, since {{properties}} is expected to be a dictionary.
> Some of the other parameters have cryptic descriptions. I'll try to clarify 
> those as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15256) Clarify the docstring for DataFrameReader.jdbc()

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15256:


Assignee: (was: Apache Spark)

> Clarify the docstring for DataFrameReader.jdbc()
> 
>
> Key: SPARK-15256
> URL: https://issues.apache.org/jira/browse/SPARK-15256
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 1.6.1
>Reporter: Nicholas Chammas
>Priority: Minor
>
> The doc for the {{properties}} parameter [currently 
> reads|https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L437-L439]:
> {quote}
> :param properties: JDBC database connection arguments, a list of 
> arbitrary string
>tag/value. Normally at least a "user" and 
> "password" property
>should be included.
> {quote}
> This is incorrect, since {{properties}} is expected to be a dictionary.
> Some of the other parameters have cryptic descriptions. I'll try to clarify 
> those as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15255) RDD name from DataFrame op should not include full local relation data

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15255:


Assignee: (was: Apache Spark)

> RDD name from DataFrame op should not include full local relation data
> --
>
> Key: SPARK-15255
> URL: https://issues.apache.org/jira/browse/SPARK-15255
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Currently, if you create a DataFrame from local data, do some operations with 
> it, and cache it, then the name of the RDD in the "Storage" tab in the Spark 
> UI will contain the entire local relation's data.  This is not scalable and 
> can cause the browser to become unresponsive.
> I'd propose there be a limit on the size of the data to display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15255) RDD name from DataFrame op should not include full local relation data

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15255:


Assignee: Apache Spark

> RDD name from DataFrame op should not include full local relation data
> --
>
> Key: SPARK-15255
> URL: https://issues.apache.org/jira/browse/SPARK-15255
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, if you create a DataFrame from local data, do some operations with 
> it, and cache it, then the name of the RDD in the "Storage" tab in the Spark 
> UI will contain the entire local relation's data.  This is not scalable and 
> can cause the browser to become unresponsive.
> I'd propose there be a limit on the size of the data to display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15255) RDD name from DataFrame op should not include full local relation data

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279007#comment-15279007
 ] 

Apache Spark commented on SPARK-15255:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/13033

> RDD name from DataFrame op should not include full local relation data
> --
>
> Key: SPARK-15255
> URL: https://issues.apache.org/jira/browse/SPARK-15255
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Currently, if you create a DataFrame from local data, do some operations with 
> it, and cache it, then the name of the RDD in the "Storage" tab in the Spark 
> UI will contain the entire local relation's data.  This is not scalable and 
> can cause the browser to become unresponsive.
> I'd propose there be a limit on the size of the data to display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15249) Use FunctionResource instead of (String, String) in CreateFunction and CatalogFunction for resource

2016-05-10 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15249.
---
  Resolution: Fixed
Assignee: Sandeep Singh
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Use FunctionResource instead of (String, String) in CreateFunction and 
> CatalogFunction for resource
> ---
>
> Key: SPARK-15249
> URL: https://issues.apache.org/jira/browse/SPARK-15249
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sandeep Singh
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 2.0.0
>
>
> Use FunctionResource instead of (String, String) in CreateFunction and 
> CatalogFunction for resource
> see: TODO's here
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L36
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala#L42



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15257) Require CREATE EXTERNAL TABLE to specify LOCATION

2016-05-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278877#comment-15278877
 ] 

Apache Spark commented on SPARK-15257:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/13032

> Require CREATE EXTERNAL TABLE to specify LOCATION
> -
>
> Key: SPARK-15257
> URL: https://issues.apache.org/jira/browse/SPARK-15257
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Right now when the user runs `CREATE EXTERNAL TABLE` without specifying 
> `LOCATION`, the table will still be created in the warehouse directory, but 
> its metadata won't be deleted even when the user drops the table! This is a 
> problem. We should use require the user to also specify `LOCATION`.
> Note: This doesn't not apply to `CREATE EXTERNAL TABLE ... USING`, which is 
> not yet supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15257) Require CREATE EXTERNAL TABLE to specify LOCATION

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15257:


Assignee: Apache Spark  (was: Andrew Or)

> Require CREATE EXTERNAL TABLE to specify LOCATION
> -
>
> Key: SPARK-15257
> URL: https://issues.apache.org/jira/browse/SPARK-15257
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>
> Right now when the user runs `CREATE EXTERNAL TABLE` without specifying 
> `LOCATION`, the table will still be created in the warehouse directory, but 
> its metadata won't be deleted even when the user drops the table! This is a 
> problem. We should use require the user to also specify `LOCATION`.
> Note: This doesn't not apply to `CREATE EXTERNAL TABLE ... USING`, which is 
> not yet supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15257) Require CREATE EXTERNAL TABLE to specify LOCATION

2016-05-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15257:


Assignee: Andrew Or  (was: Apache Spark)

> Require CREATE EXTERNAL TABLE to specify LOCATION
> -
>
> Key: SPARK-15257
> URL: https://issues.apache.org/jira/browse/SPARK-15257
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Right now when the user runs `CREATE EXTERNAL TABLE` without specifying 
> `LOCATION`, the table will still be created in the warehouse directory, but 
> its metadata won't be deleted even when the user drops the table! This is a 
> problem. We should use require the user to also specify `LOCATION`.
> Note: This doesn't not apply to `CREATE EXTERNAL TABLE ... USING`, which is 
> not yet supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15258) Nested/Chained case statements generate codegen over 64k exception

2016-05-10 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278859#comment-15278859
 ] 

Jonathan Gray commented on SPARK-15258:
---

I ran this against the master branch as of 8th May 2016 and that also exhibits 
the same behaviour

> Nested/Chained case statements generate codegen over 64k exception
> --
>
> Key: SPARK-15258
> URL: https://issues.apache.org/jira/browse/SPARK-15258
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Jonathan Gray
> Attachments: NestedCases.scala
>
>
> Nested/Chained case-when expressions generate a codegen goes beyound 64k 
> exception.
> A test attached demonstrates this behaviour.
> I'd like to try and fix this but don't really know the best place to start.  
> Ideally, I'd like to avoid the codegen fallback as with large volumes this 
> hurts performance.
> This is similar(ish) to SPARK-13242 but I'd like to see if there are any 
> alternatives to the codegen fallback approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15256) Clarify the docstring for DataFrameReader.jdbc()

2016-05-10 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-15256:
-
Description: 
The doc for the {{properties}} parameter [currently 
reads|https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L437-L439]:

{quote}
:param properties: JDBC database connection arguments, a list of 
arbitrary string
   tag/value. Normally at least a "user" and "password" 
property
   should be included.
{quote}

This is incorrect, since {{properties}} is expected to be a dictionary.

Some of the other parameters have cryptic descriptions. I'll try to clarify 
those as well.

  was:
The doc for the {{properties}} parameter [currently 
reads|https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L437-L439]:

{quote}
:param properties: JDBC database connection arguments, a list of 
arbitrary string
   tag/value. Normally at least a "user" and "password" 
property
   should be included.
{quote}

This is incorrect, since {{properties}} is expected to be a dictionary.


> Clarify the docstring for DataFrameReader.jdbc()
> 
>
> Key: SPARK-15256
> URL: https://issues.apache.org/jira/browse/SPARK-15256
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 1.6.1
>Reporter: Nicholas Chammas
>Priority: Minor
>
> The doc for the {{properties}} parameter [currently 
> reads|https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L437-L439]:
> {quote}
> :param properties: JDBC database connection arguments, a list of 
> arbitrary string
>tag/value. Normally at least a "user" and 
> "password" property
>should be included.
> {quote}
> This is incorrect, since {{properties}} is expected to be a dictionary.
> Some of the other parameters have cryptic descriptions. I'll try to clarify 
> those as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15258) Nested/Chained case statements generate codegen over 64k exception

2016-05-10 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated SPARK-15258:
--
Attachment: NestedCases.scala

Example code that exhibits the exceptional behaviour

> Nested/Chained case statements generate codegen over 64k exception
> --
>
> Key: SPARK-15258
> URL: https://issues.apache.org/jira/browse/SPARK-15258
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Jonathan Gray
> Attachments: NestedCases.scala
>
>
> Nested/Chained case-when expressions generate a codegen goes beyound 64k 
> exception.
> A test attached demonstrates this behaviour.
> I'd like to try and fix this but don't really know the best place to start.  
> Ideally, I'd like to avoid the codegen fallback as with large volumes this 
> hurts performance.
> This is similar(ish) to SPARK-13242 but I'd like to see if there are any 
> alternatives to the codegen fallback approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15256) Clarify the docstring for DataFrameReader.jdbc()

2016-05-10 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-15256:
-
Summary: Clarify the docstring for DataFrameReader.jdbc()  (was: Correct 
the docstring for DataFrameReader.jdbc())

> Clarify the docstring for DataFrameReader.jdbc()
> 
>
> Key: SPARK-15256
> URL: https://issues.apache.org/jira/browse/SPARK-15256
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 1.6.1
>Reporter: Nicholas Chammas
>Priority: Minor
>
> The doc for the {{properties}} parameter [currently 
> reads|https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L437-L439]:
> {quote}
> :param properties: JDBC database connection arguments, a list of 
> arbitrary string
>tag/value. Normally at least a "user" and 
> "password" property
>should be included.
> {quote}
> This is incorrect, since {{properties}} is expected to be a dictionary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15258) Nested/Chained case statements generate codegen over 64k exception

2016-05-10 Thread Jonathan Gray (JIRA)
Jonathan Gray created SPARK-15258:
-

 Summary: Nested/Chained case statements generate codegen over 64k 
exception
 Key: SPARK-15258
 URL: https://issues.apache.org/jira/browse/SPARK-15258
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1
Reporter: Jonathan Gray


Nested/Chained case-when expressions generate a codegen goes beyound 64k 
exception.

A test attached demonstrates this behaviour.

I'd like to try and fix this but don't really know the best place to start.  
Ideally, I'd like to avoid the codegen fallback as with large volumes this 
hurts performance.

This is similar(ish) to SPARK-13242 but I'd like to see if there are any 
alternatives to the codegen fallback approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15257) Require CREATE EXTERNAL TABLE to specify LOCATION

2016-05-10 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15257:
-

 Summary: Require CREATE EXTERNAL TABLE to specify LOCATION
 Key: SPARK-15257
 URL: https://issues.apache.org/jira/browse/SPARK-15257
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Right now when the user runs `CREATE EXTERNAL TABLE` without specifying 
`LOCATION`, the table will still be created in the warehouse directory, but its 
metadata won't be deleted even when the user drops the table! This is a 
problem. We should use require the user to also specify `LOCATION`.

Note: This doesn't not apply to `CREATE EXTERNAL TABLE ... USING`, which is not 
yet supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6005) Flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery

2016-05-10 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-6005.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery
> 
>
> Key: SPARK-6005
> URL: https://issues.apache.org/jira/browse/SPARK-6005
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Iulian Dragos
>Assignee: Shixiong Zhu
>  Labels: flaky-test, kafka, streaming
> Fix For: 2.0.0
>
>
> [Link to failing test on 
> Jenkins|https://ci.typesafe.com/view/Spark/job/spark-nightly-build/lastCompletedBuild/testReport/org.apache.spark.streaming.kafka/DirectKafkaStreamSuite/offset_recovery/]
> {code}
> The code passed to eventually never returned normally. Attempted 208 times 
> over 10.00622791 seconds. Last failure message: strings.forall({   ((elem: 
> Any) => DirectKafkaStreamSuite.collectedData.contains(elem)) }) was false.
> {code}
> {code:title=Stack trace}
> sbt.ForkMain$ForkError: The code passed to eventually never returned 
> normally. Attempted 208 times over 10.00622791 seconds. Last failure message: 
> strings.forall({
>   ((elem: Any) => DirectKafkaStreamSuite.collectedData.contains(elem))
> }) was false.
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
>   at 
> org.apache.spark.streaming.kafka.KafkaStreamSuiteBase.eventually(KafkaStreamSuite.scala:49)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
>   at 
> org.apache.spark.streaming.kafka.KafkaStreamSuiteBase.eventually(KafkaStreamSuite.scala:49)
>   at 
> org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$5.org$apache$spark$streaming$kafka$DirectKafkaStreamSuite$$anonfun$$sendDataAndWaitForReceive$1(DirectKafkaStreamSuite.scala:225)
>   at 
> org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$5.apply$mcV$sp(DirectKafkaStreamSuite.scala:287)
>   at 
> org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$5.apply(DirectKafkaStreamSuite.scala:211)
>   at 
> org.apache.spark.streaming.kafka.DirectKafkaStreamSuite$$anonfun$5.apply(DirectKafkaStreamSuite.scala:211)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
>   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.org$scalatest$BeforeAndAfter$$super$runTest(DirectKafkaStreamSuite.scala:39)
>   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
>   at 
> org.apache.spark.streaming.kafka.DirectKafkaStreamSuite.runTest(DirectKafkaStreamSuite.scala:39)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> 

[jira] [Created] (SPARK-15256) Correct the docstring for DataFrameReader.jdbc()

2016-05-10 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-15256:


 Summary: Correct the docstring for DataFrameReader.jdbc()
 Key: SPARK-15256
 URL: https://issues.apache.org/jira/browse/SPARK-15256
 Project: Spark
  Issue Type: Bug
  Components: Documentation, PySpark
Affects Versions: 1.6.1
Reporter: Nicholas Chammas
Priority: Minor


The doc for the {{properties}} parameter [currently 
reads|https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L437-L439]:

{quote}
:param properties: JDBC database connection arguments, a list of 
arbitrary string
   tag/value. Normally at least a "user" and "password" 
property
   should be included.
{quote}

This is incorrect, since {{properties}} is expected to be a dictionary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15207) Use Travis CI for Java Linter and JDK7/8 compilation test

2016-05-10 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15207:
--
  Assignee: Dongjoon Hyun
  Priority: Minor  (was: Major)
Issue Type: New Feature  (was: Task)

> Use Travis CI for Java Linter and JDK7/8 compilation test
> -
>
> Key: SPARK-15207
> URL: https://issues.apache.org/jira/browse/SPARK-15207
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.1.0
>
>
> Currently, Java Linter is disabled in Jenkins tests.
> https://github.com/apache/spark/blob/master/dev/run-tests.py#L554
> However, as of today, Spark has 721 java files with 97362 code (without 
> blank/comments). It's about 1/3 of Scala.
> {code}
> 
> Language  files  blankcomment   
> code
> 
> Scala  2353  62819 124060 
> 318747
> Java721  18617  23314  
> 97362
> {code}
> This issue aims to take advantage of Travis CI to handle the following static 
> analysis by adding a single file, `.travis.yml` without any additional burden 
> on the existing servers.
> - Java Linter
> - JDK7/JDK8 maven compile
> Note that this issue does not propose to remove some of the above work items 
> from the Jenkins. It's possible, but we need to observe the Travis CI 
> stability for a while. The goal of this issue is to removing committer's 
> overhead on linter-related PRs (the original PR and the fixation PR).
> By the way, historically, Spark used Travis CI before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15207) Use Travis CI for Java Linter and JDK7/8 compilation test

2016-05-10 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15207.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 12980
[https://github.com/apache/spark/pull/12980]

> Use Travis CI for Java Linter and JDK7/8 compilation test
> -
>
> Key: SPARK-15207
> URL: https://issues.apache.org/jira/browse/SPARK-15207
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Reporter: Dongjoon Hyun
> Fix For: 2.1.0
>
>
> Currently, Java Linter is disabled in Jenkins tests.
> https://github.com/apache/spark/blob/master/dev/run-tests.py#L554
> However, as of today, Spark has 721 java files with 97362 code (without 
> blank/comments). It's about 1/3 of Scala.
> {code}
> 
> Language  files  blankcomment   
> code
> 
> Scala  2353  62819 124060 
> 318747
> Java721  18617  23314  
> 97362
> {code}
> This issue aims to take advantage of Travis CI to handle the following static 
> analysis by adding a single file, `.travis.yml` without any additional burden 
> on the existing servers.
> - Java Linter
> - JDK7/JDK8 maven compile
> Note that this issue does not propose to remove some of the above work items 
> from the Jenkins. It's possible, but we need to observe the Travis CI 
> stability for a while. The goal of this issue is to removing committer's 
> overhead on linter-related PRs (the original PR and the fixation PR).
> By the way, historically, Spark used Travis CI before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15255) RDD name from DataFrame op should not include full local relation data

2016-05-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-15255:

Component/s: Web UI

> RDD name from DataFrame op should not include full local relation data
> --
>
> Key: SPARK-15255
> URL: https://issues.apache.org/jira/browse/SPARK-15255
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Currently, if you create a DataFrame from local data, do some operations with 
> it, and cache it, then the name of the RDD in the "Storage" tab in the Spark 
> UI will contain the entire local relation's data.  This is not scalable and 
> can cause the browser to become unresponsive.
> I'd propose there be a limit on the size of the data to display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14986) Spark SQL returns incorrect results for LATERAL VIEW OUTER queries if all inner columns are projected out

2016-05-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-14986:
-
Assignee: Herman van Hovell

> Spark SQL returns incorrect results for LATERAL VIEW OUTER queries if all 
> inner columns are projected out
> -
>
> Key: SPARK-14986
> URL: https://issues.apache.org/jira/browse/SPARK-14986
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Andrey Balmin
>Assignee: Herman van Hovell
> Fix For: 2.0.0
>
>
> Repro:   using Hive context, run this SQL query:
>select  nil from (select 1 as x ) x LATERAL VIEW OUTER EXPLODE( array ()) 
> n as nil
> Actual result: returns 0 rows.
> Expected results:  should return 1 row with null value.
> Details:
> If the query is modified to also return x:
>select x, nil from (select 1 as x ) x LATERAL VIEW OUTER EXPLODE( array 
> ()) n as nil
> it works correctly and returns 1 row: [ 1, null ]
> Clearly, changing Select clause of a query should not change the number of 
> rows it returns.
> Looking at the query plan it seems that the Generator object was 
> (incorrectly) marked with “join=false"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14986) Spark SQL returns incorrect results for LATERAL VIEW OUTER queries if all inner columns are projected out

2016-05-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-14986.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12906
[https://github.com/apache/spark/pull/12906]

> Spark SQL returns incorrect results for LATERAL VIEW OUTER queries if all 
> inner columns are projected out
> -
>
> Key: SPARK-14986
> URL: https://issues.apache.org/jira/browse/SPARK-14986
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Andrey Balmin
> Fix For: 2.0.0
>
>
> Repro:   using Hive context, run this SQL query:
>select  nil from (select 1 as x ) x LATERAL VIEW OUTER EXPLODE( array ()) 
> n as nil
> Actual result: returns 0 rows.
> Expected results:  should return 1 row with null value.
> Details:
> If the query is modified to also return x:
>select x, nil from (select 1 as x ) x LATERAL VIEW OUTER EXPLODE( array 
> ()) n as nil
> it works correctly and returns 1 row: [ 1, null ]
> Clearly, changing Select clause of a query should not change the number of 
> rows it returns.
> Looking at the query plan it seems that the Generator object was 
> (incorrectly) marked with “join=false"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15181) Python API for Generalized Linear Regression Summary

2016-05-10 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278740#comment-15278740
 ] 

Joseph K. Bradley commented on SPARK-15181:
---

Is this a duplicate of [SPARK-14982]?

> Python API for Generalized Linear Regression Summary
> 
>
> Key: SPARK-15181
> URL: https://issues.apache.org/jira/browse/SPARK-15181
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Seth Hendrickson
>
> We should add an interface to the GLR summaries in Python for feature parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15169) Consider improving HasSolver to allow generalization

2016-05-10 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-15169:
--
Summary: Consider improving HasSolver to allow generalization  (was: 
Consider improving HasSolver to allow generilization)

> Consider improving HasSolver to allow generalization
> 
>
> Key: SPARK-15169
> URL: https://issues.apache.org/jira/browse/SPARK-15169
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: holdenk
>Priority: Trivial
>
> The current HasSolver shared param has a fixed default value of "auto" and no 
> validation. Some algorithms (see `MultilayerPerceptronClassifier`) have 
> different default values or validators. This results in either a mostly 
> duplicated param (as in `MultilayerPerceptronClassifier`) or incorrect 
> scaladoc (as in `GeneralizedLinearRegression`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15161) Consider moving featureImportances into TreeEnsemble models base class

2016-05-10 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278735#comment-15278735
 ] 

Joseph K. Bradley commented on SPARK-15161:
---

Sounds fine, but we'll need to make sure it's still Java-friendly

> Consider moving featureImportances into TreeEnsemble models base class
> --
>
> Key: SPARK-15161
> URL: https://issues.apache.org/jira/browse/SPARK-15161
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: holdenk
>Priority: Minor
>
> Right now each of the subclasses has it implemented, we could consider moving 
> it to the base class (after 2.0). cc [~mlnick]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14642) import org.apache.spark.sql.expressions._ breaks udf under functions

2016-05-10 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-14642:
-
Assignee: Subhobrata Dey

> import org.apache.spark.sql.expressions._ breaks udf under functions
> 
>
> Key: SPARK-14642
> URL: https://issues.apache.org/jira/browse/SPARK-14642
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Subhobrata Dey
>Priority: Blocker
> Fix For: 2.0.0
>
>
> The following code works
> {code}
> scala> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.functions._
> scala> udf((v: String) => v.stripSuffix("-abc"))
> res0: org.apache.spark.sql.expressions.UserDefinedFunction = 
> UserDefinedFunction(,StringType,Some(List(StringType)))
> {code}
> But, the following does not
> {code}
> scala> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.functions._
> scala> import org.apache.spark.sql.expressions._
> import org.apache.spark.sql.expressions._
> scala> udf((v: String) => v.stripSuffix("-abc"))
> :30: error: No TypeTag available for String
>udf((v: String) => v.stripSuffix("-abc"))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14642) import org.apache.spark.sql.expressions._ breaks udf under functions

2016-05-10 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-14642.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

> import org.apache.spark.sql.expressions._ breaks udf under functions
> 
>
> Key: SPARK-14642
> URL: https://issues.apache.org/jira/browse/SPARK-14642
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Priority: Blocker
> Fix For: 2.0.0
>
>
> The following code works
> {code}
> scala> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.functions._
> scala> udf((v: String) => v.stripSuffix("-abc"))
> res0: org.apache.spark.sql.expressions.UserDefinedFunction = 
> UserDefinedFunction(,StringType,Some(List(StringType)))
> {code}
> But, the following does not
> {code}
> scala> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.functions._
> scala> import org.apache.spark.sql.expressions._
> import org.apache.spark.sql.expressions._
> scala> udf((v: String) => v.stripSuffix("-abc"))
> :30: error: No TypeTag available for String
>udf((v: String) => v.stripSuffix("-abc"))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15171) Deprecate registerTempTable and add dataset.createTempView

2016-05-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-15171:
-
Issue Type: New Feature  (was: Bug)

> Deprecate registerTempTable and add dataset.createTempView
> --
>
> Key: SPARK-15171
> URL: https://issues.apache.org/jira/browse/SPARK-15171
> Project: Spark
>  Issue Type: New Feature
>Reporter: Sean Zhong
>Priority: Critical
>  Labels: release_notes, releasenotes
>
> Our current dataset.registerTempTable does not actually materialize data. So, 
> it should be considered as creating a temp view. We can deprecate it and 
> create a new method called dataset.createTempView(replaceIfExists: Boolean). 
> The default value of replaceIfExists should be false. For registerTempTable, 
> it will call dataset.createTempView(replaceIfExists = true).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >