[jira] [Created] (SPARK-16003) SerializationDebugger run into infinite loop

2016-06-16 Thread Davies Liu (JIRA)
Davies Liu created SPARK-16003:
--

 Summary: SerializationDebugger run into infinite loop
 Key: SPARK-16003
 URL: https://issues.apache.org/jira/browse/SPARK-16003
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Davies Liu
Priority: Critical


This is observed while debugging 
https://issues.apache.org/jira/browse/SPARK-15811

We should fix it or disable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15966) Fix markdown for Spark Monitoring

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15966:


Assignee: Apache Spark

> Fix markdown for Spark Monitoring
> -
>
> Key: SPARK-15966
> URL: https://issues.apache.org/jira/browse/SPARK-15966
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Dhruve Ashar
>Assignee: Apache Spark
>Priority: Trivial
>
> The markdown for Spark monitoring needs to be fixed. 
> http://spark.apache.org/docs/2.0.0-preview/monitoring.html 
> The closing tag is missing for `spark.ui.view.acls.groups`, which is causing 
> the markdown to render incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15966) Fix markdown for Spark Monitoring

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15966:


Assignee: (was: Apache Spark)

> Fix markdown for Spark Monitoring
> -
>
> Key: SPARK-15966
> URL: https://issues.apache.org/jira/browse/SPARK-15966
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Dhruve Ashar
>Priority: Trivial
>
> The markdown for Spark monitoring needs to be fixed. 
> http://spark.apache.org/docs/2.0.0-preview/monitoring.html 
> The closing tag is missing for `spark.ui.view.acls.groups`, which is causing 
> the markdown to render incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15966) Fix markdown for Spark Monitoring

2016-06-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334820#comment-15334820
 ] 

Apache Spark commented on SPARK-15966:
--

User 'dhruve' has created a pull request for this issue:
https://github.com/apache/spark/pull/13719

> Fix markdown for Spark Monitoring
> -
>
> Key: SPARK-15966
> URL: https://issues.apache.org/jira/browse/SPARK-15966
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Dhruve Ashar
>Priority: Trivial
>
> The markdown for Spark monitoring needs to be fixed. 
> http://spark.apache.org/docs/2.0.0-preview/monitoring.html 
> The closing tag is missing for `spark.ui.view.acls.groups`, which is causing 
> the markdown to render incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16002) Sleep when no new data arrives to avoid 100% CPU usage

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16002:


Assignee: Apache Spark  (was: Shixiong Zhu)

> Sleep when no new data arrives to avoid 100% CPU usage
> --
>
> Key: SPARK-16002
> URL: https://issues.apache.org/jira/browse/SPARK-16002
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> Right now if the trigger is ProcessTrigger(0), StreamExecution will keep 
> polling new data even if there is no data. Then the CPU usage will be 100%. 
> We should add a minimum polling delay when no new data arrives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16002) Sleep when no new data arrives to avoid 100% CPU usage

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16002:


Assignee: Shixiong Zhu  (was: Apache Spark)

> Sleep when no new data arrives to avoid 100% CPU usage
> --
>
> Key: SPARK-16002
> URL: https://issues.apache.org/jira/browse/SPARK-16002
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> Right now if the trigger is ProcessTrigger(0), StreamExecution will keep 
> polling new data even if there is no data. Then the CPU usage will be 100%. 
> We should add a minimum polling delay when no new data arrives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16002) Sleep when no new data arrives to avoid 100% CPU usage

2016-06-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334802#comment-15334802
 ] 

Apache Spark commented on SPARK-16002:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/13718

> Sleep when no new data arrives to avoid 100% CPU usage
> --
>
> Key: SPARK-16002
> URL: https://issues.apache.org/jira/browse/SPARK-16002
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> Right now if the trigger is ProcessTrigger(0), StreamExecution will keep 
> polling new data even if there is no data. Then the CPU usage will be 100%. 
> We should add a minimum polling delay when no new data arrives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15966) Fix markdown for Spark Monitoring

2016-06-16 Thread Dhruve Ashar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhruve Ashar updated SPARK-15966:
-
Description: 
The markdown for Spark monitoring needs to be fixed. 
http://spark.apache.org/docs/2.0.0-preview/monitoring.html 
The closing tag is missing for `spark.ui.view.acls.groups`, which is causing 
the markdown to render incorrectly.

  was:
The markdown for Spark monitoring needs to be fixed. 
http://spark.apache.org/docs/2.0.0-preview/monitoring.html



> Fix markdown for Spark Monitoring
> -
>
> Key: SPARK-15966
> URL: https://issues.apache.org/jira/browse/SPARK-15966
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Dhruve Ashar
>Priority: Trivial
>
> The markdown for Spark monitoring needs to be fixed. 
> http://spark.apache.org/docs/2.0.0-preview/monitoring.html 
> The closing tag is missing for `spark.ui.view.acls.groups`, which is causing 
> the markdown to render incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16002) Sleep when no new data arrives to avoid 100% CPU usage

2016-06-16 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-16002:


 Summary: Sleep when no new data arrives to avoid 100% CPU usage
 Key: SPARK-16002
 URL: https://issues.apache.org/jira/browse/SPARK-16002
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu


Right now if the trigger is ProcessTrigger(0), StreamExecution will keep 
polling new data even if there is no data. Then the CPU usage will be 100%. We 
should add a minimum polling delay when no new data arrives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16000) Make model loading backward compatible with saved models using old vector columns

2016-06-16 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334763#comment-15334763
 ] 

Yanbo Liang commented on SPARK-16000:
-

We should only add for models which support save/load in Spark 1.6. Since we do 
not have save/load backward compatible test framework currently, we can only do 
offline test right now. If this make sense, I can work on this issue. 

> Make model loading backward compatible with saved models using old vector 
> columns
> -
>
> Key: SPARK-16000
> URL: https://issues.apache.org/jira/browse/SPARK-16000
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Xiangrui Meng
>
> To help users migrate from Spark 1.6. to 2.0, we should make model loading 
> backward compatible with models saved in 1.6. The main incompatibility is the 
> vector column type change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15501) ML 2.0 QA: Scala APIs audit for recommendation

2016-06-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334754#comment-15334754
 ] 

Joseph K. Bradley commented on SPARK-15501:
---

[~mlnick] Is this audit done, or are there checks remaining?

> ML 2.0 QA: Scala APIs audit for recommendation
> --
>
> Key: SPARK-15501
> URL: https://issues.apache.org/jira/browse/SPARK-15501
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML
>Reporter: Nick Pentreath
>Assignee: Nick Pentreath
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15786) joinWith bytecode generation calling ByteBuffer.wrap with InternalRow

2016-06-16 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334743#comment-15334743
 ] 

Sean Zhong commented on SPARK-15786:


[~yhuai] Sure, we definitely can improve it.

> joinWith bytecode generation calling ByteBuffer.wrap with InternalRow
> -
>
> Key: SPARK-15786
> URL: https://issues.apache.org/jira/browse/SPARK-15786
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Richard Marscher
>Assignee: Sean Zhong
> Fix For: 2.0.0
>
>
> {code}java.lang.RuntimeException: Error while decoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 36, Column 107: No applicable constructor/method found 
> for actual parameters "org.apache.spark.sql.catalyst.InternalRow"; candidates 
> are: "public static java.nio.ByteBuffer java.nio.ByteBuffer.wrap(byte[])", 
> "public static java.nio.ByteBuffer java.nio.ByteBuffer.wrap(byte[], int, 
> int)"{code}
> I have been trying to use joinWith along with Option data types to get an 
> approximation of the RDD semantics for outer joins with Dataset to have a 
> nicer API for Scala. However, using the Dataset.as[] syntax leads to bytecode 
> generation trying to pass an InternalRow object into the ByteBuffer.wrap 
> function which expects byte[] with or without a couple int qualifiers.
> I have a notebook reproducing this against 2.0 preview in Databricks 
> Community Edition: 
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/160347920874755/1039589581260901/673639177603143/latest.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15749) Make the error message more meaningful

2016-06-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15749.
---
  Resolution: Fixed
Assignee: Huaxin Gao
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Make the error message more meaningful
> --
>
> Key: SPARK-15749
> URL: https://issues.apache.org/jira/browse/SPARK-15749
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Trivial
> Fix For: 2.0.0
>
>
> For table test1 (C1 varchar (10), C2 varchar (10)), when I insert a row using 
> sqlContext.sql("insert into test1 values ('abc', 'def', 1)")
> I got error message
> Exception in thread "main" java.lang.RuntimeException: Relation[C1#0,C2#1] 
> JDBCRelation(test1)
>  requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE 
> statement generates the same number of columns as its schema.
> The error message is a little confusing. In my simple insert statement, it 
> doesn't have a SELECT clause. 
> I will change the error message to a more general one 
> Exception in thread "main" java.lang.RuntimeException: Relation[C1#0,C2#1] 
> JDBCRelation(test1)
>  requires that the data to be inserted have the same number of columns as the 
> target table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15643) ML 2.0 QA: migration guide update

2016-06-16 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334735#comment-15334735
 ] 

Yanbo Liang commented on SPARK-15643:
-

Sure

> ML 2.0 QA: migration guide update
> -
>
> Key: SPARK-15643
> URL: https://issues.apache.org/jira/browse/SPARK-15643
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, MLlib
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>Priority: Blocker
>
> Update spark.ml and spark.mllib migration guide from 1.6 to 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15868) Executors table in Executors tab should sort Executor IDs in numerical order (not alphabetical order)

2016-06-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15868.
---
  Resolution: Fixed
Assignee: Alex Bozarth
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Executors table in Executors tab should sort Executor IDs in numerical order 
> (not alphabetical order)
> -
>
> Key: SPARK-15868
> URL: https://issues.apache.org/jira/browse/SPARK-15868
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Alex Bozarth
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: spark-webui-executors-sorting-2.png, 
> spark-webui-executors-sorting.png
>
>
> It _appears_ that Executors table in Executors tab sorts Executor IDs in 
> alphabetical order while it should in numerical. It does sorting in a more 
> "friendly" way yet driver executor appears between 0 and 1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15947) Make pipeline components backward compatible with old vector columns in Scala/Java

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-15947:
--
Summary: Make pipeline components backward compatible with old vector 
columns in Scala/Java  (was: Make pipeline components backward compatible with 
old vector columns)

> Make pipeline components backward compatible with old vector columns in 
> Scala/Java
> --
>
> Key: SPARK-15947
> URL: https://issues.apache.org/jira/browse/SPARK-15947
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> After SPARK-15945, we should make ALL pipeline components accept old vector 
> columns as input and do the conversion automatically (probably with a warning 
> message), in order to smooth the migration to 2.0. 
> --Note that this includes loading old saved models.-- SPARK-16000 handles 
> backward compatibility in model loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-15948) Make pipeline components backward compatible with old vector columns in Python

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng closed SPARK-15948.
-
Resolution: Won't Fix

> Make pipeline components backward compatible with old vector columns in Python
> --
>
> Key: SPARK-15948
> URL: https://issues.apache.org/jira/browse/SPARK-15948
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Xiangrui Meng
>
> Same as SPARK-15947 but for Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15947) Make pipeline components backward compatible with old vector columns in Scala/Java

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-15947.
---
Resolution: Won't Fix

> Make pipeline components backward compatible with old vector columns in 
> Scala/Java
> --
>
> Key: SPARK-15947
> URL: https://issues.apache.org/jira/browse/SPARK-15947
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> After SPARK-15945, we should make ALL pipeline components accept old vector 
> columns as input and do the conversion automatically (probably with a warning 
> message), in order to smooth the migration to 2.0. 
> --Note that this includes loading old saved models.-- SPARK-16000 handles 
> backward compatibility in model loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15948) Make pipeline components backward compatible with old vector columns in Python

2016-06-16 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334732#comment-15334732
 ] 

Xiangrui Meng commented on SPARK-15948:
---

Marked this as "Won't Do". See SPARK-15947 for reasons.

> Make pipeline components backward compatible with old vector columns in Python
> --
>
> Key: SPARK-15948
> URL: https://issues.apache.org/jira/browse/SPARK-15948
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Xiangrui Meng
>
> Same as SPARK-15947 but for Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15643) ML 2.0 QA: migration guide update

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-15643:
--
Assignee: Yanbo Liang

> ML 2.0 QA: migration guide update
> -
>
> Key: SPARK-15643
> URL: https://issues.apache.org/jira/browse/SPARK-15643
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, MLlib
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>Priority: Blocker
>
> Update spark.ml and spark.mllib migration guide from 1.6 to 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15643) ML 2.0 QA: migration guide update

2016-06-16 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334729#comment-15334729
 ] 

Xiangrui Meng commented on SPARK-15643:
---

[~yanboliang] Please include a paragraph to help users convert vector columns. 
See https://issues.apache.org/jira/browse/SPARK-15947.

> ML 2.0 QA: migration guide update
> -
>
> Key: SPARK-15643
> URL: https://issues.apache.org/jira/browse/SPARK-15643
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, MLlib
>Reporter: Yanbo Liang
>Priority: Blocker
>
> Update spark.ml and spark.mllib migration guide from 1.6 to 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15947) Make pipeline components backward compatible with old vector columns

2016-06-16 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334725#comment-15334725
 ] 

Xiangrui Meng edited comment on SPARK-15947 at 6/16/16 9:30 PM:


Had an offline discussion with [~josephkb]. There would be lot of work to 
implement this feature and tests. A simpler choice is to ask users to manually 
convert the DataFrames at the beginning of the pipeline with tools implemented 
in SPARK-15945. Then we can update migration guide (SPARK-15643) to include the 
error message and put this workaround there. So users can search on Google and 
find the solution.

I'm closing this ticket.


was (Author: mengxr):
Had an offline discussion with [~josephkb]. There would be lot of work to 
implement this feature and tests. A simpler choice is to ask users to manually 
convert the DataFrames at the beginning of the pipeline with tools implemented 
in SPARK-15945. Then we can update migration guide to include the error message 
and put this workaround there. So users can search on Google and find the 
solution.

I'm closing this ticket.

> Make pipeline components backward compatible with old vector columns
> 
>
> Key: SPARK-15947
> URL: https://issues.apache.org/jira/browse/SPARK-15947
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> After SPARK-15945, we should make ALL pipeline components accept old vector 
> columns as input and do the conversion automatically (probably with a warning 
> message), in order to smooth the migration to 2.0. 
> --Note that this includes loading old saved models.-- SPARK-16000 handles 
> backward compatibility in model loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15947) Make pipeline components backward compatible with old vector columns

2016-06-16 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334725#comment-15334725
 ] 

Xiangrui Meng commented on SPARK-15947:
---

Had an offline discussion with [~josephkb]. There would be lot of work to 
implement this feature and tests. A simpler choice is to ask users to manually 
convert the DataFrames at the beginning of the pipeline with tools implemented 
in SPARK-15945. Then we can update migration guide to include the error message 
and put this workaround there. So users can search on Google and find the 
solution.

I'm closing this ticket.

> Make pipeline components backward compatible with old vector columns
> 
>
> Key: SPARK-15947
> URL: https://issues.apache.org/jira/browse/SPARK-15947
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> After SPARK-15945, we should make ALL pipeline components accept old vector 
> columns as input and do the conversion automatically (probably with a warning 
> message), in order to smooth the migration to 2.0. 
> --Note that this includes loading old saved models.-- SPARK-16000 handles 
> backward compatibility in model loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15343) NoClassDefFoundError when initializing Spark with YARN

2016-06-16 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-15343:

Attachment: (was: jersey-client-2.22.2.jar)

> NoClassDefFoundError when initializing Spark with YARN
> --
>
> Key: SPARK-15343
> URL: https://issues.apache.org/jira/browse/SPARK-15343
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop.
> Spark compiled with:
> {code}
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> {code}
> I'm getting following error
> {code}
> mbrynski@jupyter:~/spark$ bin/pyspark
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" 
> with specified deploy mode instead.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has 
> been deprecated as of Spark 2.0 and may be removed in the future. Please use 
> the new key 'spark.yarn.jars' instead.
> 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/05/16 11:54:42 WARN AbstractHandler: No Server set for 
> org.spark_project.jetty.server.handler.ErrorHandler@f7989f6
> 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> Traceback (most recent call last):
>   File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in 
> sc = SparkContext()
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__
> conf, jsc, profiler_cls)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init
> self._jsc = jsc or self._initialize_context(self._conf._jconf)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in 
> _initialize_context
> return self._jvm.JavaSparkContext(jconf)
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 1183, in __call__
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
> 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148)
> at org.apache.spark.SparkContext.(SparkContext.scala:502)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:236)
> at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
> at 
> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> com.sun.jersey.api.client.config.ClientConfig
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 19 more
> {code}
> On 1.6 everything works fine.

[jira] [Created] (SPARK-16001) request that spark history server write a log entry whenever it (1) tries cleaning old event logs and (2) has found and deleted old event logs

2016-06-16 Thread Thanh (JIRA)
Thanh created SPARK-16001:
-

 Summary: request that spark history server write a log entry 
whenever it (1) tries cleaning old event logs and (2) has found and deleted old 
event logs
 Key: SPARK-16001
 URL: https://issues.apache.org/jira/browse/SPARK-16001
 Project: Spark
  Issue Type: Improvement
Reporter: Thanh


request that spark history server write a log entry whenever it (1) tries 
cleaning old event logs and (2) has found and deleted old event logs

Currently, it doesn't log anything at all, unless there is a failure when 
trying to cleanLogs()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15767) Decision Tree Regression wrapper in SparkR

2016-06-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334721#comment-15334721
 ] 

Joseph K. Bradley commented on SPARK-15767:
---

[~vectorijk] Notes from sync: Can you please write more about the possible 
APIs?  I'd like to do a comparison of:
* the rpart API
* the MLlib DecisionTreeClassifier and DecisionTreeRegressor APIs

The comparison should list all parameters and their meaning.  The idea is to 
figure out which of the following we can do:
* Best option: Mimic rpart exactly so that R users can switch to spark.rpart 
easily
* Worst option: Sort of mimic rpart, but not exactly because of a difference in 
functionality, such as new parameters from MLlib or differences in behavior.
* Medium option: Avoid rpart API, and instead offer APIs matching 
DecisionTreeClassifier and DecisionTreeRegressor in the Scala/Java/Python APIs

> Decision Tree Regression wrapper in SparkR
> --
>
> Key: SPARK-15767
> URL: https://issues.apache.org/jira/browse/SPARK-15767
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>
> Implement a wrapper in SparkR to support decision tree regression. R's naive 
> Decision Tree Regression implementation is from package rpart with signature 
> rpart(formula, dataframe, method="anova"). I propose we could implement API 
> like spark.decisionTreeRegression(dataframe, formula, ...) .  After having 
> implemented decision tree classification, we could refactor this two into an 
> API more like rpart()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15343) NoClassDefFoundError when initializing Spark with YARN

2016-06-16 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-15343:

Attachment: jersey-client-2.22.2.jar

> NoClassDefFoundError when initializing Spark with YARN
> --
>
> Key: SPARK-15343
> URL: https://issues.apache.org/jira/browse/SPARK-15343
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
> Attachments: jersey-client-2.22.2.jar
>
>
> I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop.
> Spark compiled with:
> {code}
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> {code}
> I'm getting following error
> {code}
> mbrynski@jupyter:~/spark$ bin/pyspark
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" 
> with specified deploy mode instead.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has 
> been deprecated as of Spark 2.0 and may be removed in the future. Please use 
> the new key 'spark.yarn.jars' instead.
> 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/05/16 11:54:42 WARN AbstractHandler: No Server set for 
> org.spark_project.jetty.server.handler.ErrorHandler@f7989f6
> 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> Traceback (most recent call last):
>   File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in 
> sc = SparkContext()
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__
> conf, jsc, profiler_cls)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init
> self._jsc = jsc or self._initialize_context(self._conf._jconf)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in 
> _initialize_context
> return self._jvm.JavaSparkContext(jconf)
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 1183, in __call__
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
> 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148)
> at org.apache.spark.SparkContext.(SparkContext.scala:502)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:236)
> at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
> at 
> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> com.sun.jersey.api.client.config.ClientConfig
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 19 more
>

[jira] [Commented] (SPARK-15343) NoClassDefFoundError when initializing Spark with YARN

2016-06-16 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334719#comment-15334719
 ] 

Saisai Shao commented on SPARK-15343:
-

The class ClientConfig is still existed but the package name is change to 
org.glassfish.xx.

> NoClassDefFoundError when initializing Spark with YARN
> --
>
> Key: SPARK-15343
> URL: https://issues.apache.org/jira/browse/SPARK-15343
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
> Attachments: jersey-client-2.22.2.jar
>
>
> I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop.
> Spark compiled with:
> {code}
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> {code}
> I'm getting following error
> {code}
> mbrynski@jupyter:~/spark$ bin/pyspark
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" 
> with specified deploy mode instead.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has 
> been deprecated as of Spark 2.0 and may be removed in the future. Please use 
> the new key 'spark.yarn.jars' instead.
> 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/05/16 11:54:42 WARN AbstractHandler: No Server set for 
> org.spark_project.jetty.server.handler.ErrorHandler@f7989f6
> 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> Traceback (most recent call last):
>   File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in 
> sc = SparkContext()
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__
> conf, jsc, profiler_cls)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init
> self._jsc = jsc or self._initialize_context(self._conf._jconf)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in 
> _initialize_context
> return self._jvm.JavaSparkContext(jconf)
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 1183, in __call__
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
> 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148)
> at org.apache.spark.SparkContext.(SparkContext.scala:502)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:236)
> at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
> at 
> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> com.sun.jersey.api.client.config.ClientConfig
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass

[jira] [Resolved] (SPARK-15998) Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING

2016-06-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15998.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING
> 
>
> Key: SPARK-15998
> URL: https://issues.apache.org/jira/browse/SPARK-15998
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
> Fix For: 2.0.0
>
>
> HIVE_METASTORE_PARTITION_PRUNING is a public SQLConf. When true, some 
> predicates will be pushed down into the Hive metastore so that unmatching 
> partitions can be eliminated earlier. The current default value is false.
> So far, the code base does not have such a test case to verify whether this 
> SQLConf properly works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15998) Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING

2016-06-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15998:
--
Assignee: Xiao Li

> Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING
> 
>
> Key: SPARK-15998
> URL: https://issues.apache.org/jira/browse/SPARK-15998
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> HIVE_METASTORE_PARTITION_PRUNING is a public SQLConf. When true, some 
> predicates will be pushed down into the Hive metastore so that unmatching 
> partitions can be eliminated earlier. The current default value is false.
> So far, the code base does not have such a test case to verify whether this 
> SQLConf properly works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15343) NoClassDefFoundError when initializing Spark with YARN

2016-06-16 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334712#comment-15334712
 ] 

Marcelo Vanzin commented on SPARK-15343:


Fair point. But I think the right thing then is to just not enable that setting.

We can't just stick to really old libraries that cause other problems just 
because YARN has decided not to move on. Jersey 1.9 causes too many problems 
when it's in the classpath, making it really hard for people to use newer 
versions when they need to. Since vanilla Spark has no ATS support, disabling 
that setting should be ok. Also, it's kinda weird that YARN is even 
instantiating that client automatically when Spark has no need for it, but I 
assume there's a good reason for that.

> NoClassDefFoundError when initializing Spark with YARN
> --
>
> Key: SPARK-15343
> URL: https://issues.apache.org/jira/browse/SPARK-15343
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop.
> Spark compiled with:
> {code}
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> {code}
> I'm getting following error
> {code}
> mbrynski@jupyter:~/spark$ bin/pyspark
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" 
> with specified deploy mode instead.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has 
> been deprecated as of Spark 2.0 and may be removed in the future. Please use 
> the new key 'spark.yarn.jars' instead.
> 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/05/16 11:54:42 WARN AbstractHandler: No Server set for 
> org.spark_project.jetty.server.handler.ErrorHandler@f7989f6
> 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> Traceback (most recent call last):
>   File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in 
> sc = SparkContext()
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__
> conf, jsc, profiler_cls)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init
> self._jsc = jsc or self._initialize_context(self._conf._jconf)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in 
> _initialize_context
> return self._jvm.JavaSparkContext(jconf)
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 1183, in __call__
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
> 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148)
> at org.apache.spark.SparkContext.(SparkContext.scala:502)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:236)
> at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
> at 
> py4j.commands.ConstructorCommand.execut

[jira] [Resolved] (SPARK-15975) Improper Popen.wait() return code handling in dev/run-tests

2016-06-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15975.
---
   Resolution: Fixed
Fix Version/s: 2.0.0
   1.6.2
   1.5.3

> Improper Popen.wait() return code handling in dev/run-tests
> ---
>
> Key: SPARK-15975
> URL: https://issues.apache.org/jira/browse/SPARK-15975
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 1.6.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.5.3, 1.6.2, 2.0.0
>
>
> In dev/run-tests.py there's a line where we effectively do
> {code}
> retcode = some_popen_instance.wait()
> if retcode > 0:
>   err
> # else do nothing
> {code}
> but this code is subtlety wrong because Popen's return code will be negative 
> if the child process was terminated by a signal: 
> https://docs.python.org/2/library/subprocess.html#subprocess.Popen.returncode
> We should change this to {{retcode != 0}} so that we properly error out and 
> exit due to termination by signal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15978) Some improvement of "Show Tables"

2016-06-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15978.
---
  Resolution: Fixed
Assignee: Bo Meng  (was: Apache Spark)
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Some improvement of "Show Tables"
> -
>
> Key: SPARK-15978
> URL: https://issues.apache.org/jira/browse/SPARK-15978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Assignee: Bo Meng
>Priority: Minor
> Fix For: 2.0.0
>
>
> I've found some minor issues in "show tables" command:
> 1. In the SessionCatalog.scala, listTables(db: String) method will call 
> listTables(formatDatabaseName(db), "*") to list all the tables for certain 
> db, but in the method listTables(db: String, pattern: String), this db name 
> is formatted once more. So I think we should remove formatDatabaseName() in 
> the caller.
> 2. I suggest to add sort to listTables(db: String) in InMemoryCatalog.scala, 
> just like listDatabases().
> I will make a PR shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15343) NoClassDefFoundError when initializing Spark with YARN

2016-06-16 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334683#comment-15334683
 ] 

Saisai Shao edited comment on SPARK-15343 at 6/16/16 9:06 PM:
--

[~vanzin] [~srowen], I don't think it is a vendor specific code, look at the 
stack trace, it is thrown from {{YarnClientImpl}}, if we enable 
{{hadoop.yarn.timeline-service.enabled}} we will always meet this problem, no 
matter in Hadoop 2.6, 2.7 (Apache Hadoop or HDP one).


was (Author: jerryshao):
[~vanzin] [~srowen], I don't think it is a vendor specific code, look at the 
stack trace, it is thrown from {{YarnClientImpl}}, if we enable 
{{hadoop.yarn.timeline-service.enabled}} we will always meet this problem, no 
matter in Hadoop 2.6, 2.7.

> NoClassDefFoundError when initializing Spark with YARN
> --
>
> Key: SPARK-15343
> URL: https://issues.apache.org/jira/browse/SPARK-15343
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop.
> Spark compiled with:
> {code}
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> {code}
> I'm getting following error
> {code}
> mbrynski@jupyter:~/spark$ bin/pyspark
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" 
> with specified deploy mode instead.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has 
> been deprecated as of Spark 2.0 and may be removed in the future. Please use 
> the new key 'spark.yarn.jars' instead.
> 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/05/16 11:54:42 WARN AbstractHandler: No Server set for 
> org.spark_project.jetty.server.handler.ErrorHandler@f7989f6
> 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> Traceback (most recent call last):
>   File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in 
> sc = SparkContext()
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__
> conf, jsc, profiler_cls)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init
> self._jsc = jsc or self._initialize_context(self._conf._jconf)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in 
> _initialize_context
> return self._jvm.JavaSparkContext(jconf)
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 1183, in __call__
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
> 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148)
> at org.apache.spark.SparkContext.(SparkContext.scala:502)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:236)
> at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
> at 
> py4j.commands.Constr

[jira] [Commented] (SPARK-15343) NoClassDefFoundError when initializing Spark with YARN

2016-06-16 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334683#comment-15334683
 ] 

Saisai Shao commented on SPARK-15343:
-

[~vanzin] [~srowen], I don't think it is a vendor specific code, look at the 
stack trace, it is thrown from {{YarnClientImpl}}, if we enable 
{{hadoop.yarn.timeline-service.enabled}} we will always meet this problem, no 
matter in Hadoop 2.6, 2.7.

> NoClassDefFoundError when initializing Spark with YARN
> --
>
> Key: SPARK-15343
> URL: https://issues.apache.org/jira/browse/SPARK-15343
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop.
> Spark compiled with:
> {code}
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> {code}
> I'm getting following error
> {code}
> mbrynski@jupyter:~/spark$ bin/pyspark
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" 
> with specified deploy mode instead.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has 
> been deprecated as of Spark 2.0 and may be removed in the future. Please use 
> the new key 'spark.yarn.jars' instead.
> 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/05/16 11:54:42 WARN AbstractHandler: No Server set for 
> org.spark_project.jetty.server.handler.ErrorHandler@f7989f6
> 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> Traceback (most recent call last):
>   File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in 
> sc = SparkContext()
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__
> conf, jsc, profiler_cls)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init
> self._jsc = jsc or self._initialize_context(self._conf._jconf)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in 
> _initialize_context
> return self._jvm.JavaSparkContext(jconf)
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 1183, in __call__
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
> 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148)
> at org.apache.spark.SparkContext.(SparkContext.scala:502)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:236)
> at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
> at 
> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> com.sun.jersey.api.client.config.ClientConfig
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at jav

[jira] [Resolved] (SPARK-15796) Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15796.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13618
[https://github.com/apache/spark/pull/13618]

> Reduce spark.memory.fraction default to avoid overrunning old gen in JVM 
> default config
> ---
>
> Key: SPARK-15796
> URL: https://issues.apache.org/jira/browse/SPARK-15796
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Gabor Feher
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: baseline.txt, memfrac06.txt, memfrac063.txt, 
> memfrac066.txt
>
>
> While debugging performance issues in a Spark program, I've found a simple 
> way to slow down Spark 1.6 significantly by filling the RDD memory cache. 
> This seems to be a regression, because setting 
> "spark.memory.useLegacyMode=true" fixes the problem. Here is a repro that is 
> just a simple program that fills the memory cache of Spark using a 
> MEMORY_ONLY cached RDD (but of course this comes up in more complex 
> situations, too):
> {code}
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkConf
> import org.apache.spark.storage.StorageLevel
> object CacheDemoApp { 
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("Cache Demo Application")   
> 
> val sc = new SparkContext(conf)
> val startTime = System.currentTimeMillis()
>   
> 
> val cacheFiller = sc.parallelize(1 to 5, 1000)
> 
>   .mapPartitionsWithIndex {
> case (ix, it) =>
>   println(s"CREATE DATA PARTITION ${ix}") 
> 
>   val r = new scala.util.Random(ix)
>   it.map(x => (r.nextLong, r.nextLong))
>   }
> cacheFiller.persist(StorageLevel.MEMORY_ONLY)
> cacheFiller.foreach(identity)
> val finishTime = System.currentTimeMillis()
> val elapsedTime = (finishTime - startTime) / 1000
> println(s"TIME= $elapsedTime s")
>   }
> }
> {code}
> If I call it the following way, it completes in around 5 minutes on my 
> Laptop, while often stopping for slow Full GC cycles. I can also see with 
> jvisualvm (Visual GC plugin) that the old generation of JVM is 96.8% filled.
> {code}
> sbt package
> ~/spark-1.6.0/bin/spark-submit \
>   --class "CacheDemoApp" \
>   --master "local[2]" \
>   --driver-memory 3g \
>   --driver-java-options "-XX:+PrintGCDetails" \
>   target/scala-2.10/simple-project_2.10-1.0.jar
> {code}
> If I add any one of the below flags, then the run-time drops to around 40-50 
> seconds and the difference is coming from the drop in GC times:
>   --conf "spark.memory.fraction=0.6"
> OR
>   --conf "spark.memory.useLegacyMode=true"
> OR
>   --driver-java-options "-XX:NewRatio=3"
> All the other cache types except for DISK_ONLY produce similar symptoms. It 
> looks like that the problem is that the amount of data Spark wants to store 
> long-term ends up being larger than the old generation size in the JVM and 
> this triggers Full GC repeatedly.
> I did some research:
> * In Spark 1.6, spark.memory.fraction is the upper limit on cache size. It 
> defaults to 0.75.
> * In Spark 1.5, spark.storage.memoryFraction is the upper limit in cache 
> size. It defaults to 0.6 and...
> * http://spark.apache.org/docs/1.5.2/configuration.html even says that it 
> shouldn't be bigger than the size of the old generation.
> * On the other hand, OpenJDK's default NewRatio is 2, which means an old 
> generation size of 66%. Hence the default value in Spark 1.6 contradicts this 
> advice.
> http://spark.apache.org/docs/1.6.1/tuning.html recommends that if the old 
> generation is running close to full, then setting 
> spark.memory.storageFraction to a lower value should help. I have tried with 
> spark.memory.storageFraction=0.1, but it still doesn't fix the issue. This is 
> not a surprise: http://spark.apache.org/docs/1.6.1/configuration.html 
> explains that storageFraction is not an upper-limit but a lower limit-like 
> thing on the size of Spark's cache. The real upper limit is 
> spark.memory.fraction.
> To sum up my questions/issues:
> * At least http://spark.apache.org/docs/1.6.1/tuning.html should be fixed. 
> Maybe the old generation size should also be mentioned in configuration.html 
> near spark.memory.fraction.
> * Is it a goal for Spark to support heavy caching with default parameters and 
> without GC breakdown? If so, then better default values are needed.



--
This message was sent by A

[jira] [Assigned] (SPARK-15796) Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-15796:
-

Assignee: Sean Owen

> Reduce spark.memory.fraction default to avoid overrunning old gen in JVM 
> default config
> ---
>
> Key: SPARK-15796
> URL: https://issues.apache.org/jira/browse/SPARK-15796
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Gabor Feher
>Assignee: Sean Owen
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: baseline.txt, memfrac06.txt, memfrac063.txt, 
> memfrac066.txt
>
>
> While debugging performance issues in a Spark program, I've found a simple 
> way to slow down Spark 1.6 significantly by filling the RDD memory cache. 
> This seems to be a regression, because setting 
> "spark.memory.useLegacyMode=true" fixes the problem. Here is a repro that is 
> just a simple program that fills the memory cache of Spark using a 
> MEMORY_ONLY cached RDD (but of course this comes up in more complex 
> situations, too):
> {code}
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkConf
> import org.apache.spark.storage.StorageLevel
> object CacheDemoApp { 
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("Cache Demo Application")   
> 
> val sc = new SparkContext(conf)
> val startTime = System.currentTimeMillis()
>   
> 
> val cacheFiller = sc.parallelize(1 to 5, 1000)
> 
>   .mapPartitionsWithIndex {
> case (ix, it) =>
>   println(s"CREATE DATA PARTITION ${ix}") 
> 
>   val r = new scala.util.Random(ix)
>   it.map(x => (r.nextLong, r.nextLong))
>   }
> cacheFiller.persist(StorageLevel.MEMORY_ONLY)
> cacheFiller.foreach(identity)
> val finishTime = System.currentTimeMillis()
> val elapsedTime = (finishTime - startTime) / 1000
> println(s"TIME= $elapsedTime s")
>   }
> }
> {code}
> If I call it the following way, it completes in around 5 minutes on my 
> Laptop, while often stopping for slow Full GC cycles. I can also see with 
> jvisualvm (Visual GC plugin) that the old generation of JVM is 96.8% filled.
> {code}
> sbt package
> ~/spark-1.6.0/bin/spark-submit \
>   --class "CacheDemoApp" \
>   --master "local[2]" \
>   --driver-memory 3g \
>   --driver-java-options "-XX:+PrintGCDetails" \
>   target/scala-2.10/simple-project_2.10-1.0.jar
> {code}
> If I add any one of the below flags, then the run-time drops to around 40-50 
> seconds and the difference is coming from the drop in GC times:
>   --conf "spark.memory.fraction=0.6"
> OR
>   --conf "spark.memory.useLegacyMode=true"
> OR
>   --driver-java-options "-XX:NewRatio=3"
> All the other cache types except for DISK_ONLY produce similar symptoms. It 
> looks like that the problem is that the amount of data Spark wants to store 
> long-term ends up being larger than the old generation size in the JVM and 
> this triggers Full GC repeatedly.
> I did some research:
> * In Spark 1.6, spark.memory.fraction is the upper limit on cache size. It 
> defaults to 0.75.
> * In Spark 1.5, spark.storage.memoryFraction is the upper limit in cache 
> size. It defaults to 0.6 and...
> * http://spark.apache.org/docs/1.5.2/configuration.html even says that it 
> shouldn't be bigger than the size of the old generation.
> * On the other hand, OpenJDK's default NewRatio is 2, which means an old 
> generation size of 66%. Hence the default value in Spark 1.6 contradicts this 
> advice.
> http://spark.apache.org/docs/1.6.1/tuning.html recommends that if the old 
> generation is running close to full, then setting 
> spark.memory.storageFraction to a lower value should help. I have tried with 
> spark.memory.storageFraction=0.1, but it still doesn't fix the issue. This is 
> not a surprise: http://spark.apache.org/docs/1.6.1/configuration.html 
> explains that storageFraction is not an upper-limit but a lower limit-like 
> thing on the size of Spark's cache. The real upper limit is 
> spark.memory.fraction.
> To sum up my questions/issues:
> * At least http://spark.apache.org/docs/1.6.1/tuning.html should be fixed. 
> Maybe the old generation size should also be mentioned in configuration.html 
> near spark.memory.fraction.
> * Is it a goal for Spark to support heavy caching with default parameters and 
> without GC breakdown? If so, then better default values are needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

--

[jira] [Updated] (SPARK-16000) Make model loading backward compatible with saved models using old vector columns

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-16000:
--
Description: To help users migrate from Spark 1.6. to 2.0, we should make 
model loading backward compatible with models saved in 1.6. The main 
incompatibility is the vector column type change.

> Make model loading backward compatible with saved models using old vector 
> columns
> -
>
> Key: SPARK-16000
> URL: https://issues.apache.org/jira/browse/SPARK-16000
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Xiangrui Meng
>
> To help users migrate from Spark 1.6. to 2.0, we should make model loading 
> backward compatible with models saved in 1.6. The main incompatibility is the 
> vector column type change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15947) Make pipeline components backward compatible with old vector columns

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-15947:
--
Description: 
After SPARK-15945, we should make ALL pipeline components accept old vector 
columns as input and do the conversion automatically (probably with a warning 
message), in order to smooth the migration to 2.0. 

--Note that this includes loading old saved models.-- SPARK-16000 handles 
backward compatibility in model loading.

  was:
After SPARK-15945, we should make ALL pipeline components accept old vector 
columns as input and do the conversion automatically (probably with a warning 
message), in order to smooth the migration to 2.0. 

--Note that this includes loading old saved models.-- SPARK-15948 handles 
backward compatibility in model loading.


> Make pipeline components backward compatible with old vector columns
> 
>
> Key: SPARK-15947
> URL: https://issues.apache.org/jira/browse/SPARK-15947
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> After SPARK-15945, we should make ALL pipeline components accept old vector 
> columns as input and do the conversion automatically (probably with a warning 
> message), in order to smooth the migration to 2.0. 
> --Note that this includes loading old saved models.-- SPARK-16000 handles 
> backward compatibility in model loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15922) BlockMatrix to IndexedRowMatrix throws an error

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15922.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13643
[https://github.com/apache/spark/pull/13643]

> BlockMatrix to IndexedRowMatrix throws an error
> ---
>
> Key: SPARK-15922
> URL: https://issues.apache.org/jira/browse/SPARK-15922
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Charlie Evans
> Fix For: 2.0.0
>
>
> {code}
> import org.apache.spark.mllib.linalg.distributed._
> import org.apache.spark.mllib.linalg._
> val rows = IndexedRow(0L, new DenseVector(Array(1,2,3))) :: IndexedRow(1L, 
> new DenseVector(Array(1,2,3))):: IndexedRow(2L, new 
> DenseVector(Array(1,2,3))):: Nil
> val rdd = sc.parallelize(rows)
> val matrix = new IndexedRowMatrix(rdd, 3, 3)
> val bmat = matrix.toBlockMatrix
> val imat = bmat.toIndexedRowMatrix
> imat.rows.collect // this throws an error - Caused by: 
> java.lang.IllegalArgumentException: requirement failed: Vectors must be the 
> same length!
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15922) BlockMatrix to IndexedRowMatrix throws an error

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15922:
--
Assignee: Dongjoon Hyun

> BlockMatrix to IndexedRowMatrix throws an error
> ---
>
> Key: SPARK-15922
> URL: https://issues.apache.org/jira/browse/SPARK-15922
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Charlie Evans
>Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> {code}
> import org.apache.spark.mllib.linalg.distributed._
> import org.apache.spark.mllib.linalg._
> val rows = IndexedRow(0L, new DenseVector(Array(1,2,3))) :: IndexedRow(1L, 
> new DenseVector(Array(1,2,3))):: IndexedRow(2L, new 
> DenseVector(Array(1,2,3))):: Nil
> val rdd = sc.parallelize(rows)
> val matrix = new IndexedRowMatrix(rdd, 3, 3)
> val bmat = matrix.toBlockMatrix
> val imat = bmat.toIndexedRowMatrix
> imat.rows.collect // this throws an error - Caused by: 
> java.lang.IllegalArgumentException: requirement failed: Vectors must be the 
> same length!
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16000) Make model loading backward compatible with saved models using old vector columns

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-16000:
--
Summary: Make model loading backward compatible with saved models using old 
vector columns  (was: Make model loading backward compatible with saved models 
using old vector columns in Scala/Java)

> Make model loading backward compatible with saved models using old vector 
> columns
> -
>
> Key: SPARK-16000
> URL: https://issues.apache.org/jira/browse/SPARK-16000
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Xiangrui Meng
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15947) Make pipeline components backward compatible with old vector columns

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-15947:
--
Description: 
After SPARK-15945, we should make ALL pipeline components accept old vector 
columns as input and do the conversion automatically (probably with a warning 
message), in order to smooth the migration to 2.0. 

--Note that this includes loading old saved models.-- SPARK-15948 handles 
backward compatibility in model loading.

  was:After SPARK-15945, we should make ALL pipeline components accept old 
vector columns as input and do the conversion automatically (probably with a 
warning message), in order to smooth the migration to 2.0. Note that this 
includes loading old saved models.


> Make pipeline components backward compatible with old vector columns
> 
>
> Key: SPARK-15947
> URL: https://issues.apache.org/jira/browse/SPARK-15947
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> After SPARK-15945, we should make ALL pipeline components accept old vector 
> columns as input and do the conversion automatically (probably with a warning 
> message), in order to smooth the migration to 2.0. 
> --Note that this includes loading old saved models.-- SPARK-15948 handles 
> backward compatibility in model loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15947) Make pipeline components backward compatible with old vector columns

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-15947:
--
Summary: Make pipeline components backward compatible with old vector 
columns  (was: Make pipeline components backward compatible with old vector 
columns in Scala/Java)

> Make pipeline components backward compatible with old vector columns
> 
>
> Key: SPARK-15947
> URL: https://issues.apache.org/jira/browse/SPARK-15947
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> After SPARK-15945, we should make ALL pipeline components accept old vector 
> columns as input and do the conversion automatically (probably with a warning 
> message), in order to smooth the migration to 2.0. Note that this includes 
> loading old saved models.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16000) Make model loading backward compatible with saved models using old vector columns

2016-06-16 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-16000:
-

 Summary: Make model loading backward compatible with saved models 
using old vector columns
 Key: SPARK-16000
 URL: https://issues.apache.org/jira/browse/SPARK-16000
 Project: Spark
  Issue Type: Sub-task
  Components: ML, MLlib
Reporter: Xiangrui Meng






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16000) Make model loading backward compatible with saved models using old vector columns in Scala/Java

2016-06-16 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-16000:
--
Summary: Make model loading backward compatible with saved models using old 
vector columns in Scala/Java  (was: Make model loading backward compatible with 
saved models using old vector columns)

> Make model loading backward compatible with saved models using old vector 
> columns in Scala/Java
> ---
>
> Key: SPARK-16000
> URL: https://issues.apache.org/jira/browse/SPARK-16000
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: Xiangrui Meng
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15731) orc writer directory permissions

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15731.
---
Resolution: Cannot Reproduce

> orc writer directory permissions
> 
>
> Key: SPARK-15731
> URL: https://issues.apache.org/jira/browse/SPARK-15731
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.1
>Reporter: Ran Haim
>
> When saving orc files with partitions, the partition directories created do 
> not have x permission (even tough umask is 002), then no other users can get 
> inside those directories to read the orc file.
> When writing parquet files there is no such issue.
> code example:
> datafrmae.write.format("orc").mode("append").partitionBy("date").save("/path")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14048) Aggregation operations on structs fail when the structs have fields with special characters

2016-06-16 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334657#comment-15334657
 ] 

Sean Zhong commented on SPARK-14048:


[~simeons]  

I can now reproduce this on Databricks community edition by changing above 
notebook script to:

{code}
val rdd = sc.makeRDD(
  """{"st": {"x.y": 1}, "age": 10}""" :: """{"st": {"x.y": 2}, "age": 10}""" :: 
"""{"st": {"x.y": 2}, "age": 20}""" :: Nil)
sqlContext.read.json(rdd).registerTempTable("test")
%sql select first(st) as st from test group by age
{code}

Thanks! I will post the updates later.

> Aggregation operations on structs fail when the structs have fields with 
> special characters
> ---
>
> Key: SPARK-14048
> URL: https://issues.apache.org/jira/browse/SPARK-14048
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Databricks w/ 1.6.0
>Reporter: Simeon Simeonov
>  Labels: sql
> Attachments: bug_structs_with_backticks.html
>
>
> Consider a schema where a struct has field names with special characters, 
> e.g.,
> {code}
>  |-- st: struct (nullable = true)
>  ||-- x.y: long (nullable = true)
> {code}
> Schema such as these are frequently generated by the JSON schema generator, 
> which seems to never want to map JSON data to {{MapType}} always preferring 
> to use {{StructType}}. 
> In SparkSQL, referring to these fields requires backticks, e.g., 
> {{st.`x.y`}}. There is no problem manipulating these structs unless one is 
> using an aggregation function. It seems that, under the covers, the code is 
> not escaping fields with special characters correctly.
> For example, 
> {code}
> select first(st) as st from tbl group by something
> {code}
> generates
> {code}
> org.apache.spark.sql.catalyst.util.DataTypeException: Unsupported dataType: 
> struct. If you have a struct and a field name of it has any 
> special characters, please use backticks (`) to quote that field name, e.g. 
> `x+y`. Please note that backtick itself is not supported in a field name.
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$class.toDataType(DataTypeParser.scala:100)
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:112)
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$.parse(DataTypeParser.scala:116)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:884)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:395)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:394)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.toJsonSchema(OutputAggregator.scala:394)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:122)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:82)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:42)
>   at 
> com.databricks.backend.daemon.driver.DriverLocal.executeSql(DriverLocal.scala:306)
>   at 
> com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:161)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:464)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:365)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:196)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15977) TRUNCATE TABLE does not work with Datasource tables outside of Hive

2016-06-16 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-15977.
---
Resolution: Resolved

> TRUNCATE TABLE does not work with Datasource tables outside of Hive
> ---
>
> Key: SPARK-15977
> URL: https://issues.apache.org/jira/browse/SPARK-15977
> Project: Spark
>  Issue Type: Bug
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>
> The {{TRUNCATE TABLE}} command does not work with datasource tables without 
> Hive support. For example the following doesn't work:
> {noformat}
> DROP TABLE IF EXISTS test
> CREATE TABLE test(a INT, b STRING) USING JSON
> INSERT INTO test VALUES (1, 'a'), (2, 'b'), (3, 'c')
> SELECT * FROM test
> TRUNCATE TABLE test
> SELECT * FROM test
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14048) Aggregation operations on structs fail when the structs have fields with special characters

2016-06-16 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334644#comment-15334644
 ] 

Sean Zhong commented on SPARK-14048:


[~simeons]  

Can you share a complete notebook which we can run and reproduce the problem 
you saw? For example, the file {{include/init_scala}} is missed in your 
notebook.

> Aggregation operations on structs fail when the structs have fields with 
> special characters
> ---
>
> Key: SPARK-14048
> URL: https://issues.apache.org/jira/browse/SPARK-14048
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Databricks w/ 1.6.0
>Reporter: Simeon Simeonov
>  Labels: sql
> Attachments: bug_structs_with_backticks.html
>
>
> Consider a schema where a struct has field names with special characters, 
> e.g.,
> {code}
>  |-- st: struct (nullable = true)
>  ||-- x.y: long (nullable = true)
> {code}
> Schema such as these are frequently generated by the JSON schema generator, 
> which seems to never want to map JSON data to {{MapType}} always preferring 
> to use {{StructType}}. 
> In SparkSQL, referring to these fields requires backticks, e.g., 
> {{st.`x.y`}}. There is no problem manipulating these structs unless one is 
> using an aggregation function. It seems that, under the covers, the code is 
> not escaping fields with special characters correctly.
> For example, 
> {code}
> select first(st) as st from tbl group by something
> {code}
> generates
> {code}
> org.apache.spark.sql.catalyst.util.DataTypeException: Unsupported dataType: 
> struct. If you have a struct and a field name of it has any 
> special characters, please use backticks (`) to quote that field name, e.g. 
> `x+y`. Please note that backtick itself is not supported in a field name.
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$class.toDataType(DataTypeParser.scala:100)
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:112)
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$.parse(DataTypeParser.scala:116)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:884)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:395)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:394)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.toJsonSchema(OutputAggregator.scala:394)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:122)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:82)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:42)
>   at 
> com.databricks.backend.daemon.driver.DriverLocal.executeSql(DriverLocal.scala:306)
>   at 
> com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:161)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:464)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:365)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:196)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15069) GSoC 2016: Exposing more R and Python APIs for MLlib

2016-06-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334622#comment-15334622
 ] 

Joseph K. Bradley commented on SPARK-15069:
---

h4. 6/16/2016 - Week 4

To-do items
* Continuation of doc items: [SPARK-15672]
* Decision tree API [SPARK-15767] -> I'll add notes to this JIRA
* If there is time, begin work on forests or boosting.


> GSoC 2016: Exposing more R and Python APIs for MLlib
> 
>
> Key: SPARK-15069
> URL: https://issues.apache.org/jira/browse/SPARK-15069
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark, SparkR
>Reporter: Joseph K. Bradley
>Assignee: Kai Jiang
>  Labels: gsoc2016, mentor
> Attachments: 1458791046_[GSoC2016]ApacheSpark_KaiJiang_Proposal.pdf
>
>
> This issue is for tracking the Google Summer of Code 2016 project for Kai 
> Jiang: "Apache Spark: Exposing more R and Python APIs for MLlib"
> See attached proposal for details.  Note that the tasks listed in the 
> proposal are tentative and can adapt as the community works on these various 
> parts of MLlib.
> This umbrella will contain links for tasks included in this project, to be 
> added as each task begins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15981) Fix bug in python DataStreamReader

2016-06-16 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-15981.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

> Fix bug in python DataStreamReader
> --
>
> Key: SPARK-15981
> URL: https://issues.apache.org/jira/browse/SPARK-15981
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Bug in Python DataStreamReader API made it unusable. Because a single path 
> was being converted to a array before calling Java DataStreamReader method 
> (which takes a string only), it gave the following error. 
> {code}
> File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/readwriter.py", 
> line 947, in pyspark.sql.readwriter.DataStreamReader.json
> Failed example:
> json_sdf = spark.readStream.json(os.path.join(tempfile.mkdtemp(), 
> 'data'), schema = sdf_schema)
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 
> 1, in 
> json_sdf = spark.readStream.json(os.path.join(tempfile.mkdtemp(), 
> 'data'), schema = sdf_schema)
>   File 
> "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/readwriter.py", line 
> 963, in json
> return self._df(self._jreader.json(path))
>   File 
> "/Users/tdas/Projects/Spark/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/utils.py", 
> line 63, in deco
> return f(*a, **kw)
>   File 
> "/Users/tdas/Projects/Spark/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 316, in get_return_value
> format(target_id, ".", name, value))
> Py4JError: An error occurred while calling o121.json. Trace:
> py4j.Py4JException: Method json([class java.util.ArrayList]) does not 
> exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
>   at py4j.Gateway.invoke(Gateway.java:272)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:211)
>   at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15999) Wrong/Missing information for Spark UI/REST port

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15999.
---
Resolution: Not A Problem

You're referring to an old version -- normally we try to report JIRAs vs 
master. But this aspect hasn't changed, and I don't think it's confusing. The 
Spark master UI tries to bind to 4040, then 4041 etc if 4040 is not available. 
It's true for streaming jobs.

You haven't specified what error you encounter in trying to access the REST 
service, but presumably it's not port related.

This has enough problems that I think it should be closed. Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first.

> Wrong/Missing information for Spark UI/REST port
> 
>
> Key: SPARK-15999
> URL: https://issues.apache.org/jira/browse/SPARK-15999
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Streaming
>Affects Versions: 1.5.0
> Environment: CDH5.5.2, Spark 1.5.0
>Reporter: Faisal
>Priority: Minor
>
> *Spark Monitoring documentation*
> https://spark.apache.org/docs/1.5.0/monitoring.html
> {quote}
> You can access this interface by simply opening http://:4040 in 
> a web browser. If multiple SparkContexts are running on the same host, they 
> will bind to successive ports beginning with 4040 (4041, 4042, etc).
> {quote}
> This statement is very confusing and doesn't apply at all in spark streaming 
> jobs(unless i am missing something)
> Same is the case with REST API calls.
> {quote}
> REST API
> In addition to viewing the metrics in the UI, they are also available as 
> JSON. This gives developers an easy way to create new visualizations and 
> monitoring tools for Spark. The JSON is available for both running 
> applications, and in the history server. The endpoints are mounted at 
> /api/v1. Eg., for the history server, they would typically be accessible at 
> http://:18080/api/v1, and for a running application, at 
> http://localhost:4040/api/v1.
> {quote}
> I am running spark streaming job in CDH-5.5.2 Spark version 1.5.0
> and nowhere on driver node, executor node for running/live application i am 
> able to call rest service.
> My spark streaming jobs running in yarn cluster mode
> --master yarn-cluster
> However for historyServer
> i am able to call REST service and can pull up json messages
> using the URL
> http://historyServer:18088/api/v1/applications
> {code}
> [ {
>   "id" : "application_1463099418950_11465",
>   "name" : "PySparkShell",
>   "attempts" : [ {
> "startTime" : "2016-06-15T15:28:32.460GMT",
> "endTime" : "2016-06-15T19:01:39.100GMT",
> "sparkUser" : "abc",
> "completed" : true
>   } ]
> }, {
>   "id" : "application_1463099418950_11635",
>   "name" : "DataProcessor-ETL.ETIME",
>   "attempts" : [ {
> "attemptId" : "1",
> "startTime" : "2016-06-15T18:56:04.413GMT",
> "endTime" : "2016-06-15T18:58:00.022GMT",
> "sparkUser" : "abc",
> "completed" : true
>   } ]
> }, 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15999) Wrong/Missing information for Spark UI/REST port

2016-06-16 Thread Faisal (JIRA)
Faisal created SPARK-15999:
--

 Summary: Wrong/Missing information for Spark UI/REST port
 Key: SPARK-15999
 URL: https://issues.apache.org/jira/browse/SPARK-15999
 Project: Spark
  Issue Type: Bug
  Components: Documentation, Streaming
Affects Versions: 1.5.0
 Environment: CDH5.5.2, Spark 1.5.0
Reporter: Faisal
Priority: Minor


*Spark Monitoring documentation*

https://spark.apache.org/docs/1.5.0/monitoring.html

{quote}
You can access this interface by simply opening http://:4040 in a 
web browser. If multiple SparkContexts are running on the same host, they will 
bind to successive ports beginning with 4040 (4041, 4042, etc).
{quote}
This statement is very confusing and doesn't apply at all in spark streaming 
jobs(unless i am missing something)

Same is the case with REST API calls.
{quote}
REST API
In addition to viewing the metrics in the UI, they are also available as JSON. 
This gives developers an easy way to create new visualizations and monitoring 
tools for Spark. The JSON is available for both running applications, and in 
the history server. The endpoints are mounted at /api/v1. Eg., for the history 
server, they would typically be accessible at http://:18080/api/v1, 
and for a running application, at http://localhost:4040/api/v1.
{quote}

I am running spark streaming job in CDH-5.5.2 Spark version 1.5.0
and nowhere on driver node, executor node for running/live application i am 
able to call rest service.
My spark streaming jobs running in yarn cluster mode
--master yarn-cluster

However for historyServer
i am able to call REST service and can pull up json messages
using the URL
http://historyServer:18088/api/v1/applications
{code}
[ {
  "id" : "application_1463099418950_11465",
  "name" : "PySparkShell",
  "attempts" : [ {
"startTime" : "2016-06-15T15:28:32.460GMT",
"endTime" : "2016-06-15T19:01:39.100GMT",
"sparkUser" : "abc",
"completed" : true
  } ]
}, {
  "id" : "application_1463099418950_11635",
  "name" : "DataProcessor-ETL.ETIME",
  "attempts" : [ {
"attemptId" : "1",
"startTime" : "2016-06-15T18:56:04.413GMT",
"endTime" : "2016-06-15T18:58:00.022GMT",
"sparkUser" : "abc",
"completed" : true
  } ]
}, 
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12114) ColumnPruning rule fails in case of "Project <- Filter <- Join"

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12114:
--
Assignee: Min Qiu

> ColumnPruning rule fails in case of "Project <- Filter <- Join"
> ---
>
> Key: SPARK-12114
> URL: https://issues.apache.org/jira/browse/SPARK-12114
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Min Qiu
>Assignee: Min Qiu
> Fix For: 2.0.0
>
>
> For the query
> {code}
> SELECT c_name, c_custkey, o_orderkey, o_orderdate, 
>o_totalprice, sum(l_quantity) 
> FROM customer join orders join lineitem 
>   on c_custkey = o_custkey AND o_orderkey = l_orderkey 
>  left outer join (SELECT l_orderkey tmp_orderkey 
>   FROM lineitem 
>   GROUP BY l_orderkey 
>   HAVING sum(l_quantity) > 300) tmp 
>   on o_orderkey = tmp_orderkey 
> WHERE tmp_orderkey IS NOT NULL 
> GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice 
> ORDER BY o_totalprice DESC, o_orderdate
> {code}
> The optimizedPlan is 
> {code}
> Sort \[o_totalprice#48 DESC,o_orderdate#49 ASC]
>  
>  Aggregate 
> \[c_name#38,c_custkey#37,o_orderkey#45,o_orderdate#49,o_totalprice#48], 
> \[c_name#38,c_custkey#37,o_orderkey#45,
> o_orderdate#49,o_totalprice#48,SUM(l_quantity#58) AS _c5#36]
>   {color: green}Project 
> \[c_name#38,o_orderdate#49,c_custkey#37,o_orderkey#45,o_totalprice#48,l_quantity#58]
>Filter IS NOT NULL tmp_orderkey#35
> Join LeftOuter, Some((o_orderkey#45 = tmp_orderkey#35)){color}
>  Join Inner, Some((c_custkey#37 = o_custkey#46))
>   MetastoreRelation default, customer, None
>   Join Inner, Some((o_orderkey#45 = l_orderkey#54))
>MetastoreRelation default, orders, None
>MetastoreRelation default, lineitem, None
>  Project \[tmp_orderkey#35]
>   Filter havingCondition#86
>Aggregate \[l_orderkey#70], \[(SUM(l_quantity#74) > 300.0) AS 
> havingCondition#86,l_orderkey#70 AS tmp_orderkey#35]
> Project \[l_orderkey#70,l_quantity#74]
>  MetastoreRelation default, lineitem, None
> {code}
> Due to the pattern highlighted in green that the ColumnPruning rule fails to 
> deal with,  all columns of lineitem and orders tables are scanned. The 
> unneeded columns are also involved in the data Shuffling. The performance is 
> extremely bad if any one of the two tables is big.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9689) Cache doesn't refresh for HadoopFsRelation based table

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9689:
-
Assignee: Cheng Hao

> Cache doesn't refresh for HadoopFsRelation based table
> --
>
> Key: SPARK-9689
> URL: https://issues.apache.org/jira/browse/SPARK-9689
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheng Hao
>Assignee: Cheng Hao
> Fix For: 2.0.0
>
>
> {code:title=example|borderStyle=solid}
> // create a HadoopFsRelation based table
> sql(s"""
> |CREATE TEMPORARY TABLE jsonTable (a int, b string)
> |USING org.apache.spark.sql.json.DefaultSource
> |OPTIONS (
> |  path '${path.toString}'
> |)""".stripMargin)
>   
> // give the value from table jt
> sql(
>   s"""
>   |INSERT OVERWRITE TABLE jsonTable SELECT a, b FROM jt
> """.stripMargin)
> // cache the HadoopFsRelation Table
> sqlContext.cacheTable("jsonTable")
>
> // update the HadoopFsRelation Table
> sql(
>   s"""
> |INSERT OVERWRITE TABLE jsonTable SELECT a * 2, b FROM jt
>   """.stripMargin)
> // Even this will fail
>  sql("SELECT a, b FROM jsonTable").collect()
> // This will fail, as the cache doesn't refresh
> checkAnswer(
>   sql("SELECT a, b FROM jsonTable"),
>   sql("SELECT a * 2, b FROM jt").collect())
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11882) Allow for running Spark applications against a custom coarse grained scheduler

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11882.
---
   Resolution: Duplicate
Fix Version/s: (was: 2.0.0)

> Allow for running Spark applications against a custom coarse grained scheduler
> --
>
> Key: SPARK-11882
> URL: https://issues.apache.org/jira/browse/SPARK-11882
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core, Spark Submit
>Reporter: Jacek Lewandowski
>Priority: Minor
>
> SparkContext makes a decision which scheduler to use according to the Master 
> URI. How about running applications against a custom scheduler? Such a custom 
> scheduler would just extend {{CoarseGrainedSchedulerBackend}}. 
> The custom scheduler would be created by a provided factory. Factories would 
> be defined in the configuration like 
> {{spark.scheduler.factory.=}}, where {{name}} is the 
> scheduler name. {{SparkContext}}, once it learns that master address is not 
> for standalone, Yarn, Mesos, local or any other predefined scheduler, it 
> would resolve scheme from the provided master URI and look for the scheduler 
> factory with the name equal to the resolved scheme. 
> For example:
> {{spark.scheduler.factory.custom=org.a.b.c.CustomSchedulerFactory}}
> then Master address would be {{custom://192.168.1.1}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-11882) Allow for running Spark applications against a custom coarse grained scheduler

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-11882:
---

Just resolving as duplicate then

> Allow for running Spark applications against a custom coarse grained scheduler
> --
>
> Key: SPARK-11882
> URL: https://issues.apache.org/jira/browse/SPARK-11882
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core, Spark Submit
>Reporter: Jacek Lewandowski
>Priority: Minor
>
> SparkContext makes a decision which scheduler to use according to the Master 
> URI. How about running applications against a custom scheduler? Such a custom 
> scheduler would just extend {{CoarseGrainedSchedulerBackend}}. 
> The custom scheduler would be created by a provided factory. Factories would 
> be defined in the configuration like 
> {{spark.scheduler.factory.=}}, where {{name}} is the 
> scheduler name. {{SparkContext}}, once it learns that master address is not 
> for standalone, Yarn, Mesos, local or any other predefined scheduler, it 
> would resolve scheme from the provided master URI and look for the scheduler 
> factory with the name equal to the resolved scheme. 
> For example:
> {{spark.scheduler.factory.custom=org.a.b.c.CustomSchedulerFactory}}
> then Master address would be {{custom://192.168.1.1}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-12248) Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-12248:
---

> Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios
> --
>
> Key: SPARK-12248
> URL: https://issues.apache.org/jira/browse/SPARK-12248
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Charles Allen
> Fix For: 2.0.0
>
>
> It is possible to have spark apps that work best with either more memory or 
> more CPU.
> In a multi-tenant environment (such as Mesos) it can be very beneficial to be 
> able to limit the Coarse scheduler to guarantee an executor doesn't subscribe 
> to too many cpus or too much memory.
> This ask is to add functionality to the Coarse Mesos Scheduler to have basic 
> limits to the ratio of memory to cpu, which default to the current behavior 
> of soaking up whatever resources it can.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15996) Fix R examples by removing deprecated functions

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15996:
--
Assignee: Dongjoon Hyun

> Fix R examples by removing deprecated functions
> ---
>
> Key: SPARK-15996
> URL: https://issues.apache.org/jira/browse/SPARK-15996
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently, R examples(dataframe.R and data-manipulation.R) fail like the 
> following. We had better update those before releasing 2.0 RC. This issue 
> updates them to use up-to-date APIs.
> {code}
> $ bin/spark-submit examples/src/main/r/dataframe.R 
> ...
> Warning message:
> 'createDataFrame(sqlContext...)' is deprecated.
> Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
> See help("Deprecated") 
> ...
> Warning message:
> 'read.json(sqlContext...)' is deprecated.
> Use 'read.json(path)' instead.
> See help("Deprecated") 
> ...
> Error: could not find function "registerTempTable"
> Execution halted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12248) Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12248.
---
Resolution: Not A Problem

> Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios
> --
>
> Key: SPARK-12248
> URL: https://issues.apache.org/jira/browse/SPARK-12248
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Charles Allen
> Fix For: 2.0.0
>
>
> It is possible to have spark apps that work best with either more memory or 
> more CPU.
> In a multi-tenant environment (such as Mesos) it can be very beneficial to be 
> able to limit the Coarse scheduler to guarantee an executor doesn't subscribe 
> to too many cpus or too much memory.
> This ask is to add functionality to the Coarse Mesos Scheduler to have basic 
> limits to the ratio of memory to cpu, which default to the current behavior 
> of soaking up whatever resources it can.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15934) Return binary mode in ThriftServer

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15934:
--
Assignee: Egor Pakhomov

> Return binary mode in ThriftServer
> --
>
> Key: SPARK-15934
> URL: https://issues.apache.org/jira/browse/SPARK-15934
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Egor Pahomov
>Assignee: Egor Pakhomov
>Priority: Critical
> Fix For: 2.0.0
>
>
> In spark-2.0.0 preview binary mode was turned off (SPARK-15095). 
> It was greatly irresponsible step due to the fact, that in 1.6.1 binary mode 
> was default and it turned off in 2.0.0.
> Just to describe magnitude of harm not fixing this bug would do in my 
> organization:
> * Tableau works only though Thrift Server and only with binary format. 
> Tableau would not work with spark-2.0.0 at all!
> * I have bunch of analysts in my organization with configured sql 
> clients(DataGrip and Squirrel). I would need to go one by one to change 
> connection string for them(DataGrip). Squirrel simply do not work with http - 
> some jar hell in my case.
> * let me not mention all other stuff which connects to our data 
> infrastructure through ThriftServer as gateway. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10757) Java friendly constructor for distributed matrices

2016-06-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-10757.
---
Resolution: Won't Fix

> Java friendly constructor for distributed matrices
> --
>
> Key: SPARK-10757
> URL: https://issues.apache.org/jira/browse/SPARK-10757
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, MLlib
>Reporter: Yanbo Liang
>Priority: Minor
>
> Currently users can not construct 
> BlockMatrix/RowMatrix/IndexedRowMatrix/CoordinateMatrix at Java side because 
> that these classes did not provide java friendly constructors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15467) Getting stack overflow when attempting to query a wide Dataset (>200 fields)

2016-06-16 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334541#comment-15334541
 ] 

Herman van Hovell commented on SPARK-15467:
---

[~kiszk] Shouldn't this be opened against the new repo? 
https://github.com/janino-compiler/janino

> Getting stack overflow when attempting to query a wide Dataset (>200 fields)
> 
>
> Key: SPARK-15467
> URL: https://issues.apache.org/jira/browse/SPARK-15467
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Don Drake
>
> This can be duplicated in a spark-shell, I am running Spark 2.0.0-preview.
> {code}
> import spark.implicits._
> case class Wide(
> val f0:String = "",
> val f1:String = "",
> val f2:String = "",
> val f3:String = "",
> val f4:String = "",
> val f5:String = "",
> val f6:String = "",
> val f7:String = "",
> val f8:String = "",
> val f9:String = "",
> val f10:String = "",
> val f11:String = "",
> val f12:String = "",
> val f13:String = "",
> val f14:String = "",
> val f15:String = "",
> val f16:String = "",
> val f17:String = "",
> val f18:String = "",
> val f19:String = "",
> val f20:String = "",
> val f21:String = "",
> val f22:String = "",
> val f23:String = "",
> val f24:String = "",
> val f25:String = "",
> val f26:String = "",
> val f27:String = "",
> val f28:String = "",
> val f29:String = "",
> val f30:String = "",
> val f31:String = "",
> val f32:String = "",
> val f33:String = "",
> val f34:String = "",
> val f35:String = "",
> val f36:String = "",
> val f37:String = "",
> val f38:String = "",
> val f39:String = "",
> val f40:String = "",
> val f41:String = "",
> val f42:String = "",
> val f43:String = "",
> val f44:String = "",
> val f45:String = "",
> val f46:String = "",
> val f47:String = "",
> val f48:String = "",
> val f49:String = "",
> val f50:String = "",
> val f51:String = "",
> val f52:String = "",
> val f53:String = "",
> val f54:String = "",
> val f55:String = "",
> val f56:String = "",
> val f57:String = "",
> val f58:String = "",
> val f59:String = "",
> val f60:String = "",
> val f61:String = "",
> val f62:String = "",
> val f63:String = "",
> val f64:String = "",
> val f65:String = "",
> val f66:String = "",
> val f67:String = "",
> val f68:String = "",
> val f69:String = "",
> val f70:String = "",
> val f71:String = "",
> val f72:String = "",
> val f73:String = "",
> val f74:String = "",
> val f75:String = "",
> val f76:String = "",
> val f77:String = "",
> val f78:String = "",
> val f79:String = "",
> val f80:String = "",
> val f81:String = "",
> val f82:String = "",
> val f83:String = "",
> val f84:String = "",
> val f85:String = "",
> val f86:String = "",
> val f87:String = "",
> val f88:String = "",
> val f89:String = "",
> val f90:String = "",
> val f91:String = "",
> val f92:String = "",
> val f93:String = "",
> val f94:String = "",
> val f95:String = "",
> val f96:String = "",
> val f97:String = "",
> val f98:String = "",
> val f99:String = "",
> val f100:String = "",
> val f101:String = "",
> val f102:String = "",
> val f103:String = "",
> val f104:String = "",
> val f105:String = "",
> val f106:String = "",
> val f107:String = "",
> val f108:String = "",
> val f109:String = "",
> val f110:String = "",
> val f111:String = "",
> val f112:String = "",
> val f113:String = "",
> val f114:String = "",
> val f115:String = "",
> val f116:String = "",
> val f117:String = "",
> val f118:String = "",
> val f119:String = "",
> val f120:String = "",
> val f121:String = "",
> val f122:String = "",
> val f123:String = "",
> val f124:String = "",
> val f125:String = "",
> val f126:String = "",
> val f127:String = "",
> val f128:String = "",
> val f129:String = "",
> val f130:String = "",
> val f131:String = "",
> val f132:String = "",
> val f133:String = "",
> val f134:String = "",
> val f135:String = "",
> val f136:String = "",
> val f137:String = "",
> val f138:String = "",
> val f139:String = "",
> val f140:String = "",
> val f141:String = "",
> val f142:String = "",
> val f143:String = "",
> val f144:String = "",
> val f145:String = "",
> val f146:String = "",
> val f147:String = "",
> val f148:String = "",
> val f149:String = "",
> val f150:String = "",
> val f151:String = "",
> val f152:String = "",
> val f153:String = "",
> val f154:String = "",
> val f155:String = "",
> val f156:String = "",
> val f157:String = "",
> val f158:String = "",
> val f159:String = "",
> val f160:String = "",
> val f161:String = "",
> val f162:String = "",
> val f163:String = "",
> val f164:String = "",
> val f165:String = "",
> val f166:String = "",
> val f167:String = "",
> val f168:String = "",
> val f169:String = "",
> val f170:String = "",
> val f171:String = "",
> val f172:String = "",
> val f173:String = "",
> val f174:String = "

[jira] [Resolved] (SPARK-15996) Fix R examples by removing deprecated functions

2016-06-16 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-15996.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13714
[https://github.com/apache/spark/pull/13714]

> Fix R examples by removing deprecated functions
> ---
>
> Key: SPARK-15996
> URL: https://issues.apache.org/jira/browse/SPARK-15996
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Reporter: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently, R examples(dataframe.R and data-manipulation.R) fail like the 
> following. We had better update those before releasing 2.0 RC. This issue 
> updates them to use up-to-date APIs.
> {code}
> $ bin/spark-submit examples/src/main/r/dataframe.R 
> ...
> Warning message:
> 'createDataFrame(sqlContext...)' is deprecated.
> Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
> See help("Deprecated") 
> ...
> Warning message:
> 'read.json(sqlContext...)' is deprecated.
> Use 'read.json(path)' instead.
> See help("Deprecated") 
> ...
> Error: could not find function "registerTempTable"
> Execution halted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15811) Python UDFs do not work in Spark 2.0-preview built with scala 2.10

2016-06-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334474#comment-15334474
 ] 

Apache Spark commented on SPARK-15811:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/13717

> Python UDFs do not work in Spark 2.0-preview built with scala 2.10
> --
>
> Key: SPARK-15811
> URL: https://issues.apache.org/jira/browse/SPARK-15811
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Franklyn Dsouza
>Assignee: Davies Liu
>Priority: Blocker
>
> I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following
> {code}
> ./dev/change-version-to-2.10.sh
> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 
> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
> {code}
> and then ran the following code in a pyspark shell
> {code}
> from pyspark.sql import SparkSession 
> from pyspark.sql.types import IntegerType, StructField, StructType
> from pyspark.sql.functions import udf
> from pyspark.sql.types import Row
> spark = SparkSession.builder.master('local[4]').appName('2.0 
> DF').getOrCreate()
> add_one = udf(lambda x: x + 1, IntegerType())
> schema = StructType([StructField('a', IntegerType(), False)])
> df = sqlContext.createDataFrame([Row(a=1),Row(a=2)], schema)
> df.select(add_one(df.a).alias('incremented')).collect()
> {code}
> This never returns with a result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15811) Python UDFs do not work in Spark 2.0-preview built with scala 2.10

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15811:


Assignee: Davies Liu  (was: Apache Spark)

> Python UDFs do not work in Spark 2.0-preview built with scala 2.10
> --
>
> Key: SPARK-15811
> URL: https://issues.apache.org/jira/browse/SPARK-15811
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Franklyn Dsouza
>Assignee: Davies Liu
>Priority: Blocker
>
> I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following
> {code}
> ./dev/change-version-to-2.10.sh
> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 
> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
> {code}
> and then ran the following code in a pyspark shell
> {code}
> from pyspark.sql import SparkSession 
> from pyspark.sql.types import IntegerType, StructField, StructType
> from pyspark.sql.functions import udf
> from pyspark.sql.types import Row
> spark = SparkSession.builder.master('local[4]').appName('2.0 
> DF').getOrCreate()
> add_one = udf(lambda x: x + 1, IntegerType())
> schema = StructType([StructField('a', IntegerType(), False)])
> df = sqlContext.createDataFrame([Row(a=1),Row(a=2)], schema)
> df.select(add_one(df.a).alias('incremented')).collect()
> {code}
> This never returns with a result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15811) Python UDFs do not work in Spark 2.0-preview built with scala 2.10

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15811:


Assignee: Apache Spark  (was: Davies Liu)

> Python UDFs do not work in Spark 2.0-preview built with scala 2.10
> --
>
> Key: SPARK-15811
> URL: https://issues.apache.org/jira/browse/SPARK-15811
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Franklyn Dsouza
>Assignee: Apache Spark
>Priority: Blocker
>
> I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following
> {code}
> ./dev/change-version-to-2.10.sh
> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 
> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
> {code}
> and then ran the following code in a pyspark shell
> {code}
> from pyspark.sql import SparkSession 
> from pyspark.sql.types import IntegerType, StructField, StructType
> from pyspark.sql.functions import udf
> from pyspark.sql.types import Row
> spark = SparkSession.builder.master('local[4]').appName('2.0 
> DF').getOrCreate()
> add_one = udf(lambda x: x + 1, IntegerType())
> schema = StructType([StructField('a', IntegerType(), False)])
> df = sqlContext.createDataFrame([Row(a=1),Row(a=2)], schema)
> df.select(add_one(df.a).alias('incremented')).collect()
> {code}
> This never returns with a result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14048) Aggregation operations on structs fail when the structs have fields with special characters

2016-06-16 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334456#comment-15334456
 ] 

Simeon Simeonov edited comment on SPARK-14048 at 6/16/16 7:02 PM:
--

[~clockfly] The above code executes with no error on the same cluster where the 
example I shared fails. As I had speculated earlier, there must be something in 
the particular data structures we have that triggers the problem, which you can 
see in the attached notebook.


was (Author: simeons):
[~clockfly] The code executes with no error on the same cluster where the 
example I shared fails. As I had speculated earlier, there must be something in 
the particular data structures we have that triggers the problem, which you can 
see in the attached notebook.

> Aggregation operations on structs fail when the structs have fields with 
> special characters
> ---
>
> Key: SPARK-14048
> URL: https://issues.apache.org/jira/browse/SPARK-14048
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Databricks w/ 1.6.0
>Reporter: Simeon Simeonov
>  Labels: sql
> Attachments: bug_structs_with_backticks.html
>
>
> Consider a schema where a struct has field names with special characters, 
> e.g.,
> {code}
>  |-- st: struct (nullable = true)
>  ||-- x.y: long (nullable = true)
> {code}
> Schema such as these are frequently generated by the JSON schema generator, 
> which seems to never want to map JSON data to {{MapType}} always preferring 
> to use {{StructType}}. 
> In SparkSQL, referring to these fields requires backticks, e.g., 
> {{st.`x.y`}}. There is no problem manipulating these structs unless one is 
> using an aggregation function. It seems that, under the covers, the code is 
> not escaping fields with special characters correctly.
> For example, 
> {code}
> select first(st) as st from tbl group by something
> {code}
> generates
> {code}
> org.apache.spark.sql.catalyst.util.DataTypeException: Unsupported dataType: 
> struct. If you have a struct and a field name of it has any 
> special characters, please use backticks (`) to quote that field name, e.g. 
> `x+y`. Please note that backtick itself is not supported in a field name.
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$class.toDataType(DataTypeParser.scala:100)
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:112)
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$.parse(DataTypeParser.scala:116)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:884)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:395)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:394)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.toJsonSchema(OutputAggregator.scala:394)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:122)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:82)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:42)
>   at 
> com.databricks.backend.daemon.driver.DriverLocal.executeSql(DriverLocal.scala:306)
>   at 
> com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:161)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:464)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:365)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:196)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: is

[jira] [Commented] (SPARK-14048) Aggregation operations on structs fail when the structs have fields with special characters

2016-06-16 Thread Simeon Simeonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334456#comment-15334456
 ] 

Simeon Simeonov commented on SPARK-14048:
-

[~clockfly] The code executes with no error on the same cluster where the 
example I shared fails. As I had speculated earlier, there must be something in 
the particular data structures we have that triggers the problem, which you can 
see in the attached notebook.

> Aggregation operations on structs fail when the structs have fields with 
> special characters
> ---
>
> Key: SPARK-14048
> URL: https://issues.apache.org/jira/browse/SPARK-14048
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: Databricks w/ 1.6.0
>Reporter: Simeon Simeonov
>  Labels: sql
> Attachments: bug_structs_with_backticks.html
>
>
> Consider a schema where a struct has field names with special characters, 
> e.g.,
> {code}
>  |-- st: struct (nullable = true)
>  ||-- x.y: long (nullable = true)
> {code}
> Schema such as these are frequently generated by the JSON schema generator, 
> which seems to never want to map JSON data to {{MapType}} always preferring 
> to use {{StructType}}. 
> In SparkSQL, referring to these fields requires backticks, e.g., 
> {{st.`x.y`}}. There is no problem manipulating these structs unless one is 
> using an aggregation function. It seems that, under the covers, the code is 
> not escaping fields with special characters correctly.
> For example, 
> {code}
> select first(st) as st from tbl group by something
> {code}
> generates
> {code}
> org.apache.spark.sql.catalyst.util.DataTypeException: Unsupported dataType: 
> struct. If you have a struct and a field name of it has any 
> special characters, please use backticks (`) to quote that field name, e.g. 
> `x+y`. Please note that backtick itself is not supported in a field name.
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$class.toDataType(DataTypeParser.scala:100)
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:112)
>   at 
> org.apache.spark.sql.catalyst.util.DataTypeParser$.parse(DataTypeParser.scala:116)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:884)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:395)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$$anonfun$toJsonSchema$1.apply(OutputAggregator.scala:394)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.toJsonSchema(OutputAggregator.scala:394)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.maybeApplyOutputAggregation(OutputAggregator.scala:122)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:82)
>   at 
> com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:42)
>   at 
> com.databricks.backend.daemon.driver.DriverLocal.executeSql(DriverLocal.scala:306)
>   at 
> com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:161)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$3.apply(DriverWrapper.scala:467)
>   at scala.util.Try$.apply(Try.scala:161)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:464)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:365)
>   at 
> com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:196)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15467) Getting stack overflow when attempting to query a wide Dataset (>200 fields)

2016-06-16 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334443#comment-15334443
 ] 

Kazuaki Ishizaki commented on SPARK-15467:
--

We are waiting for author's review at https://github.com/aunkrig/janino/pull/7

> Getting stack overflow when attempting to query a wide Dataset (>200 fields)
> 
>
> Key: SPARK-15467
> URL: https://issues.apache.org/jira/browse/SPARK-15467
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Don Drake
>
> This can be duplicated in a spark-shell, I am running Spark 2.0.0-preview.
> {code}
> import spark.implicits._
> case class Wide(
> val f0:String = "",
> val f1:String = "",
> val f2:String = "",
> val f3:String = "",
> val f4:String = "",
> val f5:String = "",
> val f6:String = "",
> val f7:String = "",
> val f8:String = "",
> val f9:String = "",
> val f10:String = "",
> val f11:String = "",
> val f12:String = "",
> val f13:String = "",
> val f14:String = "",
> val f15:String = "",
> val f16:String = "",
> val f17:String = "",
> val f18:String = "",
> val f19:String = "",
> val f20:String = "",
> val f21:String = "",
> val f22:String = "",
> val f23:String = "",
> val f24:String = "",
> val f25:String = "",
> val f26:String = "",
> val f27:String = "",
> val f28:String = "",
> val f29:String = "",
> val f30:String = "",
> val f31:String = "",
> val f32:String = "",
> val f33:String = "",
> val f34:String = "",
> val f35:String = "",
> val f36:String = "",
> val f37:String = "",
> val f38:String = "",
> val f39:String = "",
> val f40:String = "",
> val f41:String = "",
> val f42:String = "",
> val f43:String = "",
> val f44:String = "",
> val f45:String = "",
> val f46:String = "",
> val f47:String = "",
> val f48:String = "",
> val f49:String = "",
> val f50:String = "",
> val f51:String = "",
> val f52:String = "",
> val f53:String = "",
> val f54:String = "",
> val f55:String = "",
> val f56:String = "",
> val f57:String = "",
> val f58:String = "",
> val f59:String = "",
> val f60:String = "",
> val f61:String = "",
> val f62:String = "",
> val f63:String = "",
> val f64:String = "",
> val f65:String = "",
> val f66:String = "",
> val f67:String = "",
> val f68:String = "",
> val f69:String = "",
> val f70:String = "",
> val f71:String = "",
> val f72:String = "",
> val f73:String = "",
> val f74:String = "",
> val f75:String = "",
> val f76:String = "",
> val f77:String = "",
> val f78:String = "",
> val f79:String = "",
> val f80:String = "",
> val f81:String = "",
> val f82:String = "",
> val f83:String = "",
> val f84:String = "",
> val f85:String = "",
> val f86:String = "",
> val f87:String = "",
> val f88:String = "",
> val f89:String = "",
> val f90:String = "",
> val f91:String = "",
> val f92:String = "",
> val f93:String = "",
> val f94:String = "",
> val f95:String = "",
> val f96:String = "",
> val f97:String = "",
> val f98:String = "",
> val f99:String = "",
> val f100:String = "",
> val f101:String = "",
> val f102:String = "",
> val f103:String = "",
> val f104:String = "",
> val f105:String = "",
> val f106:String = "",
> val f107:String = "",
> val f108:String = "",
> val f109:String = "",
> val f110:String = "",
> val f111:String = "",
> val f112:String = "",
> val f113:String = "",
> val f114:String = "",
> val f115:String = "",
> val f116:String = "",
> val f117:String = "",
> val f118:String = "",
> val f119:String = "",
> val f120:String = "",
> val f121:String = "",
> val f122:String = "",
> val f123:String = "",
> val f124:String = "",
> val f125:String = "",
> val f126:String = "",
> val f127:String = "",
> val f128:String = "",
> val f129:String = "",
> val f130:String = "",
> val f131:String = "",
> val f132:String = "",
> val f133:String = "",
> val f134:String = "",
> val f135:String = "",
> val f136:String = "",
> val f137:String = "",
> val f138:String = "",
> val f139:String = "",
> val f140:String = "",
> val f141:String = "",
> val f142:String = "",
> val f143:String = "",
> val f144:String = "",
> val f145:String = "",
> val f146:String = "",
> val f147:String = "",
> val f148:String = "",
> val f149:String = "",
> val f150:String = "",
> val f151:String = "",
> val f152:String = "",
> val f153:String = "",
> val f154:String = "",
> val f155:String = "",
> val f156:String = "",
> val f157:String = "",
> val f158:String = "",
> val f159:String = "",
> val f160:String = "",
> val f161:String = "",
> val f162:String = "",
> val f163:String = "",
> val f164:String = "",
> val f165:String = "",
> val f166:String = "",
> val f167:String = "",
> val f168:String = "",
> val f169:String = "",
> val f170:String = "",
> val f171:String = "",
> val f172:String = "",
> val f173:String = "",
> val f174:String = "",
> val f175:String =

[jira] [Updated] (SPARK-15811) Python UDFs do not work in Spark 2.0-preview built with scala 2.10

2016-06-16 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-15811:
---
Description: 
I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following

{code}
./dev/change-version-to-2.10.sh
./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 
-Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
{code}

and then ran the following code in a pyspark shell

{code}
from pyspark.sql import SparkSession 
from pyspark.sql.types import IntegerType, StructField, StructType
from pyspark.sql.functions import udf
from pyspark.sql.types import Row
spark = SparkSession.builder.master('local[4]').appName('2.0 DF').getOrCreate()
add_one = udf(lambda x: x + 1, IntegerType())
schema = StructType([StructField('a', IntegerType(), False)])
df = sqlContext.createDataFrame([Row(a=1),Row(a=2)], schema)
df.select(add_one(df.a).alias('incremented')).collect()
{code}

This never returns with a result. 


  was:
I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following

{code}
./dev/change-version-to-2.10.sh
./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 
-Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
{code}

and then ran the following code in a pyspark shell

{code}
from pyspark.sql.types import IntegerType, StructField, StructType
from pyspark.sql.functions import udf
from pyspark.sql.types import Row
spark = SparkSession.builder.master('local[4]').appName('2.0 DF').getOrCreate()
add_one = udf(lambda x: x + 1, IntegerType())
schema = StructType([StructField('a', IntegerType(), False)])
df = sqlContext.createDataFrame([Row(a=1),Row(a=2)], schema)
df.select(add_one(df.a).alias('incremented')).collect()
{code}

This never returns with a result. 



> Python UDFs do not work in Spark 2.0-preview built with scala 2.10
> --
>
> Key: SPARK-15811
> URL: https://issues.apache.org/jira/browse/SPARK-15811
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Franklyn Dsouza
>Assignee: Davies Liu
>Priority: Blocker
>
> I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following
> {code}
> ./dev/change-version-to-2.10.sh
> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 
> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
> {code}
> and then ran the following code in a pyspark shell
> {code}
> from pyspark.sql import SparkSession 
> from pyspark.sql.types import IntegerType, StructField, StructType
> from pyspark.sql.functions import udf
> from pyspark.sql.types import Row
> spark = SparkSession.builder.master('local[4]').appName('2.0 
> DF').getOrCreate()
> add_one = udf(lambda x: x + 1, IntegerType())
> schema = StructType([StructField('a', IntegerType(), False)])
> df = sqlContext.createDataFrame([Row(a=1),Row(a=2)], schema)
> df.select(add_one(df.a).alias('incremented')).collect()
> {code}
> This never returns with a result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15998) Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15998:


Assignee: Apache Spark

> Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING
> 
>
> Key: SPARK-15998
> URL: https://issues.apache.org/jira/browse/SPARK-15998
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> HIVE_METASTORE_PARTITION_PRUNING is a public SQLConf. When true, some 
> predicates will be pushed down into the Hive metastore so that unmatching 
> partitions can be eliminated earlier. The current default value is false.
> So far, the code base does not have such a test case to verify whether this 
> SQLConf properly works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15998) Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING

2016-06-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334418#comment-15334418
 ] 

Apache Spark commented on SPARK-15998:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/13716

> Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING
> 
>
> Key: SPARK-15998
> URL: https://issues.apache.org/jira/browse/SPARK-15998
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> HIVE_METASTORE_PARTITION_PRUNING is a public SQLConf. When true, some 
> predicates will be pushed down into the Hive metastore so that unmatching 
> partitions can be eliminated earlier. The current default value is false.
> So far, the code base does not have such a test case to verify whether this 
> SQLConf properly works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15998) Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15998:


Assignee: (was: Apache Spark)

> Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING
> 
>
> Key: SPARK-15998
> URL: https://issues.apache.org/jira/browse/SPARK-15998
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> HIVE_METASTORE_PARTITION_PRUNING is a public SQLConf. When true, some 
> predicates will be pushed down into the Hive metastore so that unmatching 
> partitions can be eliminated earlier. The current default value is false.
> So far, the code base does not have such a test case to verify whether this 
> SQLConf properly works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15998) Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING

2016-06-16 Thread Xiao Li (JIRA)
Xiao Li created SPARK-15998:
---

 Summary: Verification of SQLConf HIVE_METASTORE_PARTITION_PRUNING
 Key: SPARK-15998
 URL: https://issues.apache.org/jira/browse/SPARK-15998
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


HIVE_METASTORE_PARTITION_PRUNING is a public SQLConf. When true, some 
predicates will be pushed down into the Hive metastore so that unmatching 
partitions can be eliminated earlier. The current default value is false.

So far, the code base does not have such a test case to verify whether this 
SQLConf properly works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15997) Audit ml.feature Update documentation for ml feature transformers

2016-06-16 Thread Gayathri Murali (JIRA)
Gayathri Murali created SPARK-15997:
---

 Summary: Audit ml.feature Update documentation for ml feature 
transformers
 Key: SPARK-15997
 URL: https://issues.apache.org/jira/browse/SPARK-15997
 Project: Spark
  Issue Type: Documentation
  Components: ML, MLlib
Affects Versions: 2.0.0
Reporter: Gayathri Murali


This JIRA is a subtask of SPARK-15100 and improves documentation for new 
features added to 
1. HashingTF
2. Countvectorizer
3. QuantileDiscretizer




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15996) Fix R examples by removing deprecated functions

2016-06-16 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-15996:
--
Description: 
Currently, R examples(dataframe.R and data-manipulation.R) fail like the 
following. We had better update those before releasing 2.0 RC. This issue 
updates them to use up-to-date APIs.
{code}
$ bin/spark-submit examples/src/main/r/dataframe.R 
...
Warning message:
'createDataFrame(sqlContext...)' is deprecated.
Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
See help("Deprecated") 
...
Warning message:
'read.json(sqlContext...)' is deprecated.
Use 'read.json(path)' instead.
See help("Deprecated") 
...
Error: could not find function "registerTempTable"
Execution halted
{code}

  was:
Currently, R dataframe example fails like the following. We had better update 
that before releasing 2.0 RC. This issue update that to use up-to-date APIs.
{code}
$ bin/spark-submit examples/src/main/r/dataframe.R 
...
Warning message:
'createDataFrame(sqlContext...)' is deprecated.
Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
See help("Deprecated") 
...
Warning message:
'read.json(sqlContext...)' is deprecated.
Use 'read.json(path)' instead.
See help("Deprecated") 
...
Error: could not find function "registerTempTable"
Execution halted
{code}

Summary: Fix R examples by removing deprecated functions  (was: Fix R 
dataframe example by removing deprecated functions)

> Fix R examples by removing deprecated functions
> ---
>
> Key: SPARK-15996
> URL: https://issues.apache.org/jira/browse/SPARK-15996
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> Currently, R examples(dataframe.R and data-manipulation.R) fail like the 
> following. We had better update those before releasing 2.0 RC. This issue 
> updates them to use up-to-date APIs.
> {code}
> $ bin/spark-submit examples/src/main/r/dataframe.R 
> ...
> Warning message:
> 'createDataFrame(sqlContext...)' is deprecated.
> Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
> See help("Deprecated") 
> ...
> Warning message:
> 'read.json(sqlContext...)' is deprecated.
> Use 'read.json(path)' instead.
> See help("Deprecated") 
> ...
> Error: could not find function "registerTempTable"
> Execution halted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15981) Fix bug in python DataStreamReader

2016-06-16 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-15981:
--
Description: 
Bug in Python DataStreamReader API made it unusable. Because a single path was 
being converted to a array before calling Java DataStreamReader method (which 
takes a string only), it gave the following error. 

{code}
File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/readwriter.py", line 
947, in pyspark.sql.readwriter.DataStreamReader.json
Failed example:
json_sdf = spark.readStream.json(os.path.join(tempfile.mkdtemp(), 'data'),  
   schema = sdf_schema)
Exception raised:
Traceback (most recent call last):
  File 
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
 line 1253, in __run
compileflags, 1) in test.globs
  File "", line 1, 
in 
json_sdf = spark.readStream.json(os.path.join(tempfile.mkdtemp(), 
'data'), schema = sdf_schema)
  File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/readwriter.py", 
line 963, in json
return self._df(self._jreader.json(path))
  File 
"/Users/tdas/Projects/Spark/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
 line 933, in __call__
answer, self.gateway_client, self.target_id, self.name)
  File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/utils.py", line 
63, in deco
return f(*a, **kw)
  File 
"/Users/tdas/Projects/Spark/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
 line 316, in get_return_value
format(target_id, ".", name, value))
Py4JError: An error occurred while calling o121.json. Trace:
py4j.Py4JException: Method json([class java.util.ArrayList]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:744)
{code}




  was:
Bug in Python DataStreamReader API made it unusable. 




> Fix bug in python DataStreamReader
> --
>
> Key: SPARK-15981
> URL: https://issues.apache.org/jira/browse/SPARK-15981
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Blocker
>
> Bug in Python DataStreamReader API made it unusable. Because a single path 
> was being converted to a array before calling Java DataStreamReader method 
> (which takes a string only), it gave the following error. 
> {code}
> File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/readwriter.py", 
> line 947, in pyspark.sql.readwriter.DataStreamReader.json
> Failed example:
> json_sdf = spark.readStream.json(os.path.join(tempfile.mkdtemp(), 
> 'data'), schema = sdf_schema)
> Exception raised:
> Traceback (most recent call last):
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py",
>  line 1253, in __run
> compileflags, 1) in test.globs
>   File "", line 
> 1, in 
> json_sdf = spark.readStream.json(os.path.join(tempfile.mkdtemp(), 
> 'data'), schema = sdf_schema)
>   File 
> "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/readwriter.py", line 
> 963, in json
> return self._df(self._jreader.json(path))
>   File 
> "/Users/tdas/Projects/Spark/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "/Users/tdas/Projects/Spark/spark/python/pyspark/sql/utils.py", 
> line 63, in deco
> return f(*a, **kw)
>   File 
> "/Users/tdas/Projects/Spark/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 316, in get_return_value
> format(target_id, ".", name, value))
> Py4JError: An error occurred while calling o121.json. Trace:
> py4j.Py4JException: Method json([class java.util.ArrayList]) does not 
> exist
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
>   at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
>   at py4j.Gateway.invoke(Gateway.java:272)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:211)
>   at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Closed] (SPARK-12248) Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios

2016-06-16 Thread Charles Allen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen closed SPARK-12248.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios
> --
>
> Key: SPARK-12248
> URL: https://issues.apache.org/jira/browse/SPARK-12248
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Charles Allen
> Fix For: 2.0.0
>
>
> It is possible to have spark apps that work best with either more memory or 
> more CPU.
> In a multi-tenant environment (such as Mesos) it can be very beneficial to be 
> able to limit the Coarse scheduler to guarantee an executor doesn't subscribe 
> to too many cpus or too much memory.
> This ask is to add functionality to the Coarse Mesos Scheduler to have basic 
> limits to the ratio of memory to cpu, which default to the current behavior 
> of soaking up whatever resources it can.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12248) Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios

2016-06-16 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334376#comment-15334376
 ] 

Charles Allen commented on SPARK-12248:
---

The limit of one task per slave seems to have been removed. That solves at 
least my use case in this matter. 

> Make Spark Coarse Mesos Scheduler obey limits on memory/cpu ratios
> --
>
> Key: SPARK-12248
> URL: https://issues.apache.org/jira/browse/SPARK-12248
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Charles Allen
> Fix For: 2.0.0
>
>
> It is possible to have spark apps that work best with either more memory or 
> more CPU.
> In a multi-tenant environment (such as Mesos) it can be very beneficial to be 
> able to limit the Coarse scheduler to guarantee an executor doesn't subscribe 
> to too many cpus or too much memory.
> This ask is to add functionality to the Coarse Mesos Scheduler to have basic 
> limits to the ratio of memory to cpu, which default to the current behavior 
> of soaking up whatever resources it can.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15992) Code cleanup mesos coarse backend offer evaluation workflow

2016-06-16 Thread Charles Allen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated SPARK-15992:
--
Attachment: (was: 
0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch)

> Code cleanup mesos coarse backend offer evaluation workflow
> ---
>
> Key: SPARK-15992
> URL: https://issues.apache.org/jira/browse/SPARK-15992
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>  Labels: code-cleanup
>
> The offer acceptance workflow is a little hard to follow and not very 
> extensible for future considerations for offers. This is a patch that makes 
> the workflow a little more explicit in its handling of offer resources.
> Patch incoming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15994) Allow enabling Mesos fetch cache in coarse executor backend

2016-06-16 Thread Charles Allen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Allen updated SPARK-15994:
--
Attachment: (was: 0001-Add-ability-to-enable-mesos-fetch-cache.patch)

> Allow enabling Mesos fetch cache in coarse executor backend 
> 
>
> Key: SPARK-15994
> URL: https://issues.apache.org/jira/browse/SPARK-15994
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>
> Mesos 0.23.0 introduces a Fetch Cache feature 
> http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of 
> resources specified in command URIs.
> This patch:
> * Updates the Mesos shaded protobuf dependency to 0.23.0
> * Allows setting `spark.mesos.fetchCache.enable` to enable the fetch cache 
> for all specified URIs. (URIs must be specified for the setting to have any 
> affect)
> * Updates documentation for Mesos configuration with the new setting.
> This patch does NOT:
> * Allow for per-URI caching configuration. The cache setting is global to ALL 
> URIs for the command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15992) Code cleanup mesos coarse backend offer evaluation workflow

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15992:


Assignee: Apache Spark

> Code cleanup mesos coarse backend offer evaluation workflow
> ---
>
> Key: SPARK-15992
> URL: https://issues.apache.org/jira/browse/SPARK-15992
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>Assignee: Apache Spark
>  Labels: code-cleanup
> Attachments: 
> 0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch
>
>
> The offer acceptance workflow is a little hard to follow and not very 
> extensible for future considerations for offers. This is a patch that makes 
> the workflow a little more explicit in its handling of offer resources.
> Patch incoming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15992) Code cleanup mesos coarse backend offer evaluation workflow

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15992:


Assignee: (was: Apache Spark)

> Code cleanup mesos coarse backend offer evaluation workflow
> ---
>
> Key: SPARK-15992
> URL: https://issues.apache.org/jira/browse/SPARK-15992
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>  Labels: code-cleanup
> Attachments: 
> 0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch
>
>
> The offer acceptance workflow is a little hard to follow and not very 
> extensible for future considerations for offers. This is a patch that makes 
> the workflow a little more explicit in its handling of offer resources.
> Patch incoming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15992) Code cleanup mesos coarse backend offer evaluation workflow

2016-06-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334364#comment-15334364
 ] 

Apache Spark commented on SPARK-15992:
--

User 'drcrallen' has created a pull request for this issue:
https://github.com/apache/spark/pull/13715

> Code cleanup mesos coarse backend offer evaluation workflow
> ---
>
> Key: SPARK-15992
> URL: https://issues.apache.org/jira/browse/SPARK-15992
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>  Labels: code-cleanup
> Attachments: 
> 0001-Refactor-MesosCoarseGrainedSchedulerBackend-offer-co.patch
>
>
> The offer acceptance workflow is a little hard to follow and not very 
> extensible for future considerations for offers. This is a patch that makes 
> the workflow a little more explicit in its handling of offer resources.
> Patch incoming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15996) Fix R dataframe example by removing deprecated functions

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15996:


Assignee: (was: Apache Spark)

> Fix R dataframe example by removing deprecated functions
> 
>
> Key: SPARK-15996
> URL: https://issues.apache.org/jira/browse/SPARK-15996
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> Currently, R dataframe example fails like the following. We had better update 
> that before releasing 2.0 RC. This issue update that to use up-to-date APIs.
> {code}
> $ bin/spark-submit examples/src/main/r/dataframe.R 
> ...
> Warning message:
> 'createDataFrame(sqlContext...)' is deprecated.
> Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
> See help("Deprecated") 
> ...
> Warning message:
> 'read.json(sqlContext...)' is deprecated.
> Use 'read.json(path)' instead.
> See help("Deprecated") 
> ...
> Error: could not find function "registerTempTable"
> Execution halted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15996) Fix R dataframe example by removing deprecated functions

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15996:


Assignee: Apache Spark

> Fix R dataframe example by removing deprecated functions
> 
>
> Key: SPARK-15996
> URL: https://issues.apache.org/jira/browse/SPARK-15996
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, R dataframe example fails like the following. We had better update 
> that before releasing 2.0 RC. This issue update that to use up-to-date APIs.
> {code}
> $ bin/spark-submit examples/src/main/r/dataframe.R 
> ...
> Warning message:
> 'createDataFrame(sqlContext...)' is deprecated.
> Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
> See help("Deprecated") 
> ...
> Warning message:
> 'read.json(sqlContext...)' is deprecated.
> Use 'read.json(path)' instead.
> See help("Deprecated") 
> ...
> Error: could not find function "registerTempTable"
> Execution halted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15996) Fix R dataframe example by removing deprecated functions

2016-06-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334362#comment-15334362
 ] 

Apache Spark commented on SPARK-15996:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/13714

> Fix R dataframe example by removing deprecated functions
> 
>
> Key: SPARK-15996
> URL: https://issues.apache.org/jira/browse/SPARK-15996
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> Currently, R dataframe example fails like the following. We had better update 
> that before releasing 2.0 RC. This issue update that to use up-to-date APIs.
> {code}
> $ bin/spark-submit examples/src/main/r/dataframe.R 
> ...
> Warning message:
> 'createDataFrame(sqlContext...)' is deprecated.
> Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
> See help("Deprecated") 
> ...
> Warning message:
> 'read.json(sqlContext...)' is deprecated.
> Use 'read.json(path)' instead.
> See help("Deprecated") 
> ...
> Error: could not find function "registerTempTable"
> Execution halted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15994) Allow enabling Mesos fetch cache in coarse executor backend

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15994:


Assignee: Apache Spark

> Allow enabling Mesos fetch cache in coarse executor backend 
> 
>
> Key: SPARK-15994
> URL: https://issues.apache.org/jira/browse/SPARK-15994
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>Assignee: Apache Spark
> Attachments: 0001-Add-ability-to-enable-mesos-fetch-cache.patch
>
>
> Mesos 0.23.0 introduces a Fetch Cache feature 
> http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of 
> resources specified in command URIs.
> This patch:
> * Updates the Mesos shaded protobuf dependency to 0.23.0
> * Allows setting `spark.mesos.fetchCache.enable` to enable the fetch cache 
> for all specified URIs. (URIs must be specified for the setting to have any 
> affect)
> * Updates documentation for Mesos configuration with the new setting.
> This patch does NOT:
> * Allow for per-URI caching configuration. The cache setting is global to ALL 
> URIs for the command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15994) Allow enabling Mesos fetch cache in coarse executor backend

2016-06-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334356#comment-15334356
 ] 

Apache Spark commented on SPARK-15994:
--

User 'drcrallen' has created a pull request for this issue:
https://github.com/apache/spark/pull/13713

> Allow enabling Mesos fetch cache in coarse executor backend 
> 
>
> Key: SPARK-15994
> URL: https://issues.apache.org/jira/browse/SPARK-15994
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Charles Allen
> Attachments: 0001-Add-ability-to-enable-mesos-fetch-cache.patch
>
>
> Mesos 0.23.0 introduces a Fetch Cache feature 
> http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of 
> resources specified in command URIs.
> This patch:
> * Updates the Mesos shaded protobuf dependency to 0.23.0
> * Allows setting `spark.mesos.fetchCache.enable` to enable the fetch cache 
> for all specified URIs. (URIs must be specified for the setting to have any 
> affect)
> * Updates documentation for Mesos configuration with the new setting.
> This patch does NOT:
> * Allow for per-URI caching configuration. The cache setting is global to ALL 
> URIs for the command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15994) Allow enabling Mesos fetch cache in coarse executor backend

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15994:


Assignee: (was: Apache Spark)

> Allow enabling Mesos fetch cache in coarse executor backend 
> 
>
> Key: SPARK-15994
> URL: https://issues.apache.org/jira/browse/SPARK-15994
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Charles Allen
> Attachments: 0001-Add-ability-to-enable-mesos-fetch-cache.patch
>
>
> Mesos 0.23.0 introduces a Fetch Cache feature 
> http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of 
> resources specified in command URIs.
> This patch:
> * Updates the Mesos shaded protobuf dependency to 0.23.0
> * Allows setting `spark.mesos.fetchCache.enable` to enable the fetch cache 
> for all specified URIs. (URIs must be specified for the setting to have any 
> affect)
> * Updates documentation for Mesos configuration with the new setting.
> This patch does NOT:
> * Allow for per-URI caching configuration. The cache setting is global to ALL 
> URIs for the command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15608) Add document for ML IsotonicRegression

2016-06-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-15608:
--
Issue Type: Documentation  (was: Sub-task)
Parent: (was: SPARK-15099)

> Add document for ML IsotonicRegression
> --
>
> Key: SPARK-15608
> URL: https://issues.apache.org/jira/browse/SPARK-15608
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Yanbo Liang
>Priority: Minor
>
> Feel free to copy the document from mllib to ml for IsotonicRegression, and 
> update it if necessary.
> Meanwhile, add examples and use "include_example" to include them in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15099) Audit: ml.regression

2016-06-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-15099:
--
Assignee: Yanbo Liang

> Audit: ml.regression
> 
>
> Key: SPARK-15099
> URL: https://issues.apache.org/jira/browse/SPARK-15099
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
>
> Audit this sub-package for new algorithms which do not have corresponding 
> sections & examples in the user guide.
> See parent issue for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15099) Audit: ml.regression

2016-06-16 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-15099.
---
Resolution: Done

Marking as done since this JIRA is just for the audit.  Thanks [~yanboliang]!

> Audit: ml.regression
> 
>
> Key: SPARK-15099
> URL: https://issues.apache.org/jira/browse/SPARK-15099
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
>
> Audit this sub-package for new algorithms which do not have corresponding 
> sections & examples in the user guide.
> See parent issue for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15996) Fix R dataframe example by removing deprecated functions

2016-06-16 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-15996:
-

 Summary: Fix R dataframe example by removing deprecated functions
 Key: SPARK-15996
 URL: https://issues.apache.org/jira/browse/SPARK-15996
 Project: Spark
  Issue Type: Bug
  Components: Examples
Reporter: Dongjoon Hyun
Priority: Minor


Currently, R dataframe example fails like the following. We had better update 
that before releasing 2.0 RC. This issue update that to use up-to-date APIs.
{code}
$ bin/spark-submit examples/src/main/r/dataframe.R 
...
Warning message:
'createDataFrame(sqlContext...)' is deprecated.
Use 'createDataFrame(data, schema = NULL, samplingRatio = 1.0)' instead.
See help("Deprecated") 
...
Warning message:
'read.json(sqlContext...)' is deprecated.
Use 'read.json(path)' instead.
See help("Deprecated") 
...
Error: could not find function "registerTempTable"
Execution halted
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15100) Audit: ml.feature

2016-06-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334346#comment-15334346
 ] 

Joseph K. Bradley commented on SPARK-15100:
---

[~yuhaoyan]  Is it correct that you finished the audit of ml.feature?  Also, 
can you please make sure that there are subtasks for each of the issues 
identified during the audit & that they are linked here?  Then we can close 
this issue.  Thanks!

> Audit: ml.feature
> -
>
> Key: SPARK-15100
> URL: https://issues.apache.org/jira/browse/SPARK-15100
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Priority: Blocker
>
> Audit this sub-package for new algorithms which do not have corresponding 
> sections & examples in the user guide.
> See parent issue for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15786) joinWith bytecode generation calling ByteBuffer.wrap with InternalRow

2016-06-16 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334331#comment-15334331
 ] 

Yin Huai commented on SPARK-15786:
--

Is there any chance that we can let users know what is wrong exactly? This 
error message is much better than the previous error. However, it looks like it 
still does not point out what part of the user code is not allowed.

> joinWith bytecode generation calling ByteBuffer.wrap with InternalRow
> -
>
> Key: SPARK-15786
> URL: https://issues.apache.org/jira/browse/SPARK-15786
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Richard Marscher
>Assignee: Sean Zhong
> Fix For: 2.0.0
>
>
> {code}java.lang.RuntimeException: Error while decoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 36, Column 107: No applicable constructor/method found 
> for actual parameters "org.apache.spark.sql.catalyst.InternalRow"; candidates 
> are: "public static java.nio.ByteBuffer java.nio.ByteBuffer.wrap(byte[])", 
> "public static java.nio.ByteBuffer java.nio.ByteBuffer.wrap(byte[], int, 
> int)"{code}
> I have been trying to use joinWith along with Option data types to get an 
> approximation of the RDD semantics for outer joins with Dataset to have a 
> nicer API for Scala. However, using the Dataset.as[] syntax leads to bytecode 
> generation trying to pass an InternalRow object into the ByteBuffer.wrap 
> function which expects byte[] with or without a couple int qualifiers.
> I have a notebook reproducing this against 2.0 preview in Databricks 
> Community Edition: 
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/160347920874755/1039589581260901/673639177603143/latest.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15990) Support rolling log aggregation for Spark running on YARN

2016-06-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334320#comment-15334320
 ] 

Apache Spark commented on SPARK-15990:
--

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/13712

> Support rolling log aggregation for Spark running on YARN
> -
>
> Key: SPARK-15990
> URL: https://issues.apache.org/jira/browse/SPARK-15990
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Saisai Shao
>Priority: Minor
>
> YARN supports rolling log aggregation since version 2.6+, it will aggregate 
> the logs in a timely manner and upload to the HDFS. Compared to the previous 
> log aggregation method which only aggregate the logs after application is 
> finished, this will speed up the log aggregation time. Also this will avoid 
> too large log file problem (out of disk).
> So here propose to introduce this feature for Spark on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15990) Support rolling log aggregation for Spark running on YARN

2016-06-16 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15990:


Assignee: (was: Apache Spark)

> Support rolling log aggregation for Spark running on YARN
> -
>
> Key: SPARK-15990
> URL: https://issues.apache.org/jira/browse/SPARK-15990
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Saisai Shao
>Priority: Minor
>
> YARN supports rolling log aggregation since version 2.6+, it will aggregate 
> the logs in a timely manner and upload to the HDFS. Compared to the previous 
> log aggregation method which only aggregate the logs after application is 
> finished, this will speed up the log aggregation time. Also this will avoid 
> too large log file problem (out of disk).
> So here propose to introduce this feature for Spark on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   >