[jira] [Commented] (SPARK-14365) Repartition by column

2016-05-05 Thread Dmitriy Selivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273718#comment-15273718
 ] 

Dmitriy Selivanov commented on SPARK-14365:
---

I will be able to check it on 2016-05-11.

> Repartition by column
> -
>
> Key: SPARK-14365
> URL: https://issues.apache.org/jira/browse/SPARK-14365
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Dmitriy Selivanov
>
> Starting from 1.6 it is possible to set partitioning for data frames. For 
> example in Scala we can do it in a following way:
> {code}
> val partitioned = df.repartition($"k")
> {code}
> Would be nice to have this functionality in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15136) Linkify ML PyDoc

2016-05-05 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273717#comment-15273717
 ] 

Yanbo Liang edited comment on SPARK-15136 at 5/6/16 6:51 AM:
-

For missing classes/methods/parameters, we can separate to subtasks according 
to ml components. But for the issues not relevant to specific ml components, we 
can do them together according the issue topics. Thanks!


was (Author: yanboliang):
For missing classes/methods/parameters, we can separate to subtasks according 
to ml components. But for the issues not relevant to ml components, we can do 
them together according the issue topics. Thanks!

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Minor
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15136) Linkify ML PyDoc

2016-05-05 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273717#comment-15273717
 ] 

Yanbo Liang commented on SPARK-15136:
-

For missing classes/methods/parameters, we can separate to subtasks according 
to ml components. But for the issues not relevant to ml components, we can do 
them together according the issue topics. Thanks!

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Minor
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15136) Linkify ML PyDoc

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273713#comment-15273713
 ] 

Apache Spark commented on SPARK-15136:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/12918

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Minor
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15136) Linkify ML PyDoc

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15136:


Assignee: Apache Spark

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Minor
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15136) Linkify ML PyDoc

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15136:


Assignee: (was: Apache Spark)

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Minor
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15136) Linkify ML PyDoc

2016-05-05 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273709#comment-15273709
 ] 

holdenk commented on SPARK-15136:
-

Seems reasonable, I've closed the two sub-tasks and I'll switch the PR to this 
JIRA directly. Just wanted to break it up so whoever was reviewing the other ml 
components could do it at their own pace.

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Minor
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-15138) Linkify ML PyDoc regression

2016-05-05 Thread holdenk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk closed SPARK-15138.
---
Resolution: Duplicate

> Linkify ML PyDoc regression
> ---
>
> Key: SPARK-15138
> URL: https://issues.apache.org/jira/browse/SPARK-15138
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-15137) Linkify ML PyDoc classification

2016-05-05 Thread holdenk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk closed SPARK-15137.
---
Resolution: Duplicate

> Linkify ML PyDoc classification
> ---
>
> Key: SPARK-15137
> URL: https://issues.apache.org/jira/browse/SPARK-15137
> Project: Spark
>  Issue Type: Sub-task
>Reporter: holdenk
>Priority: Minor
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15163) Mark experimental algorithms experimental in PySpark

2016-05-05 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273705#comment-15273705
 ] 

holdenk commented on SPARK-15163:
-

I think you were talking about https://issues.apache.org/jira/browse/SPARK-15136

> Mark experimental algorithms experimental in PySpark
> 
>
> Key: SPARK-15163
> URL: https://issues.apache.org/jira/browse/SPARK-15163
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> While we are going through them anyways might as well mark the PySpark 
> algorithm as experimental that are marked so in Scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15163) Mark experimental algorithms experimental in PySpark

2016-05-05 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273698#comment-15273698
 ] 

Yanbo Liang commented on SPARK-15163:
-

[~holdenk] I think this is the kind of thing we want to do in pretty big 
sweeps, so it's better to switch all non-standard PyDoc links to standard in a 
single task.

> Mark experimental algorithms experimental in PySpark
> 
>
> Key: SPARK-15163
> URL: https://issues.apache.org/jira/browse/SPARK-15163
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> While we are going through them anyways might as well mark the PySpark 
> algorithm as experimental that are marked so in Scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15136) Linkify ML PyDoc

2016-05-05 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273695#comment-15273695
 ] 

Yanbo Liang commented on SPARK-15136:
-

[~holdenk] I think this is the kind of thing we want to do in pretty big 
sweeps, so it's better to switch all non-standard PyDoc links to standard in a 
single task.

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Minor
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15173) DataFrameWriter.insertInto should work with datasource table stored in hive

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15173:


Assignee: Wenchen Fan  (was: Apache Spark)

> DataFrameWriter.insertInto should work with datasource table stored in hive
> ---
>
> Key: SPARK-15173
> URL: https://issues.apache.org/jira/browse/SPARK-15173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15173) DataFrameWriter.insertInto should work with datasource table stored in hive

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15173:


Assignee: Apache Spark  (was: Wenchen Fan)

> DataFrameWriter.insertInto should work with datasource table stored in hive
> ---
>
> Key: SPARK-15173
> URL: https://issues.apache.org/jira/browse/SPARK-15173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15173) DataFrameWriter.insertInto should work with datasource table stored in hive

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273693#comment-15273693
 ] 

Apache Spark commented on SPARK-15173:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/12949

> DataFrameWriter.insertInto should work with datasource table stored in hive
> ---
>
> Key: SPARK-15173
> URL: https://issues.apache.org/jira/browse/SPARK-15173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15174) DataFrame does not have correct number of rows after dropDuplicates

2016-05-05 Thread Ian Hellstrom (JIRA)
Ian Hellstrom created SPARK-15174:
-

 Summary: DataFrame does not have correct number of rows after 
dropDuplicates
 Key: SPARK-15174
 URL: https://issues.apache.org/jira/browse/SPARK-15174
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
Reporter: Ian Hellstrom


If you read an empty file/folder with the {{SQLContext.read()}} function and 
call {{DataFrame.dropDuplicates()}}, the number of rows is incorrect.

{code:scala}
val input = "hdfs:///some/empty/directory"
val df1 = sqlContext.read.json(input)
val df2 = sqlContext.read.json(input).dropDuplicates

df1.count == 0 // true
df1.rdd.isEmpty // true

df2.count == 0 // false: it's actually reported as 1
df2.rdd.isEmpty // false
{code:scala}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15174) DataFrame does not have correct number of rows after dropDuplicates

2016-05-05 Thread Ian Hellstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Hellstrom updated SPARK-15174:
--
Description: 
If you read an empty file/folder with the {{SQLContext.read()}} function and 
call {{DataFrame.dropDuplicates()}}, the number of rows is incorrect.

{code}
val input = "hdfs:///some/empty/directory"
val df1 = sqlContext.read.json(input)
val df2 = sqlContext.read.json(input).dropDuplicates

df1.count == 0 // true
df1.rdd.isEmpty // true

df2.count == 0 // false: it's actually reported as 1
df2.rdd.isEmpty // false
{code}

  was:
If you read an empty file/folder with the {{SQLContext.read()}} function and 
call {{DataFrame.dropDuplicates()}}, the number of rows is incorrect.

{code:scala}
val input = "hdfs:///some/empty/directory"
val df1 = sqlContext.read.json(input)
val df2 = sqlContext.read.json(input).dropDuplicates

df1.count == 0 // true
df1.rdd.isEmpty // true

df2.count == 0 // false: it's actually reported as 1
df2.rdd.isEmpty // false
{code:scala}


> DataFrame does not have correct number of rows after dropDuplicates
> ---
>
> Key: SPARK-15174
> URL: https://issues.apache.org/jira/browse/SPARK-15174
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
>Reporter: Ian Hellstrom
>
> If you read an empty file/folder with the {{SQLContext.read()}} function and 
> call {{DataFrame.dropDuplicates()}}, the number of rows is incorrect.
> {code}
> val input = "hdfs:///some/empty/directory"
> val df1 = sqlContext.read.json(input)
> val df2 = sqlContext.read.json(input).dropDuplicates
> df1.count == 0 // true
> df1.rdd.isEmpty // true
> df2.count == 0 // false: it's actually reported as 1
> df2.rdd.isEmpty // false
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15173) DataFrameWriter.insertInto should work with datasource table stored in hive

2016-05-05 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-15173:
---

 Summary: DataFrameWriter.insertInto should work with datasource 
table stored in hive
 Key: SPARK-15173
 URL: https://issues.apache.org/jira/browse/SPARK-15173
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15172) Warning message should explicitly tell user initial coefficients is ignored if its size doesn't match expected size in LogisticRegression

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273685#comment-15273685
 ] 

Apache Spark commented on SPARK-15172:
--

User 'dding3' has created a pull request for this issue:
https://github.com/apache/spark/pull/12948

> Warning message should explicitly tell user initial coefficients is ignored 
> if its size doesn't match expected size in LogisticRegression
> -
>
> Key: SPARK-15172
> URL: https://issues.apache.org/jira/browse/SPARK-15172
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: ding
>Priority: Trivial
>
> From ML/LogisticRegression code logic, if size of initial coefficients 
> doesn't match expected size, initial coefficients value will be ignored. We 
> should explicitly tell user the information. Besides, log size of initial 
> coefficients should be more straightforward than log initial coefficients 
> value when size mismatch happened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org




[jira] [Assigned] (SPARK-15172) Warning message should explicitly tell user initial coefficients is ignored if its size doesn't match expected size in LogisticRegression

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15172:


Assignee: (was: Apache Spark)

> Warning message should explicitly tell user initial coefficients is ignored 
> if its size doesn't match expected size in LogisticRegression
> -
>
> Key: SPARK-15172
> URL: https://issues.apache.org/jira/browse/SPARK-15172
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: ding
>Priority: Trivial
>
> From ML/LogisticRegression code logic, if size of initial coefficients 
> doesn't match expected size, initial coefficients value will be ignored. We 
> should explicitly tell user the information. Besides, log size of initial 
> coefficients should be more straightforward than log initial coefficients 
> value when size mismatch happened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15172) Warning message should explicitly tell user initial coefficients is ignored if its size doesn't match expected size in LogisticRegression

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15172:


Assignee: Apache Spark

> Warning message should explicitly tell user initial coefficients is ignored 
> if its size doesn't match expected size in LogisticRegression
> -
>
> Key: SPARK-15172
> URL: https://issues.apache.org/jira/browse/SPARK-15172
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: ding
>Assignee: Apache Spark
>Priority: Trivial
>
> From ML/LogisticRegression code logic, if size of initial coefficients 
> doesn't match expected size, initial coefficients value will be ignored. We 
> should explicitly tell user the information. Besides, log size of initial 
> coefficients should be more straightforward than log initial coefficients 
> value when size mismatch happened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15172) Warning message should explicitly tell user initial coefficients is ignored if its size doesn't match expected size in LogisticRegression

2016-05-05 Thread ding (JIRA)
ding created SPARK-15172:


 Summary: Warning message should explicitly tell user initial 
coefficients is ignored if its size doesn't match expected size in 
LogisticRegression
 Key: SPARK-15172
 URL: https://issues.apache.org/jira/browse/SPARK-15172
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: ding
Priority: Trivial


>From ML/LogisticRegression code logic, if size of initial coefficients doesn't 
>match expected size, initial coefficients value will be ignored. We should 
>explicitly tell user the information. Besides, log size of initial 
>coefficients should be more straightforward than log initial coefficients 
>value when size mismatch happened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14365) Repartition by column

2016-05-05 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273664#comment-15273664
 ] 

Sun Rui commented on SPARK-14365:
-

[~dselivanov] Could you verify if SPARK-15110 can solve your problem?

> Repartition by column
> -
>
> Key: SPARK-14365
> URL: https://issues.apache.org/jira/browse/SPARK-14365
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Dmitriy Selivanov
>
> Starting from 1.6 it is possible to set partitioning for data frames. For 
> example in Scala we can do it in a following way:
> {code}
> val partitioned = df.repartition($"k")
> {code}
> Would be nice to have this functionality in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14365) Repartition by column

2016-05-05 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui closed SPARK-14365.
---
Resolution: Duplicate

> Repartition by column
> -
>
> Key: SPARK-14365
> URL: https://issues.apache.org/jira/browse/SPARK-14365
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Dmitriy Selivanov
>
> Starting from 1.6 it is possible to set partitioning for data frames. For 
> example in Scala we can do it in a following way:
> {code}
> val partitioned = df.repartition($"k")
> {code}
> Would be nice to have this functionality in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15159) Remove usage of HiveContext in SparkR.

2016-05-05 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273655#comment-15273655
 ] 

Sun Rui commented on SPARK-15159:
-

[~felixcheung], I guess you are talking about SQLContext, not HiveContext. For 
SQLContext, it is kept for backward compatibility, we don't need to change it 
for now.

HiveContext is deprecated, not removed. However, I don't think it is a big 
change. Two pieces only:
1. Modify SparkRHive.init() to use SparkSession;
2. Investigate if we need to change using of TestHiveContext in SparkR unit 
tests. A rough look seems no change is needed. But not sure.

[~vsparmar] Feel free to take this JIRA.

> Remove usage of HiveContext in SparkR.
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> HiveContext is to be deprecated in 2.0.  Replace them with 
> SparkSession.withHiveSupport in SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14476) Show table name or path in string of DataSourceScan

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273632#comment-15273632
 ] 

Apache Spark commented on SPARK-14476:
--

User 'clockfly' has created a pull request for this issue:
https://github.com/apache/spark/pull/12947

> Show table name or path in string of DataSourceScan
> ---
>
> Key: SPARK-14476
> URL: https://issues.apache.org/jira/browse/SPARK-14476
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Cheng Lian
>Priority: Critical
>
> right now, the string of DataSourceScan is only "HadoopFiles xxx", without 
> any information about the table name or path. 
> Since we have that in 1.6, this is kind of regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14476) Show table name or path in string of DataSourceScan

2016-05-05 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273630#comment-15273630
 ] 

Sean Zhong commented on SPARK-14476:


Regression of SPARK-12012

> Show table name or path in string of DataSourceScan
> ---
>
> Key: SPARK-14476
> URL: https://issues.apache.org/jira/browse/SPARK-14476
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Cheng Lian
>Priority: Critical
>
> right now, the string of DataSourceScan is only "HadoopFiles xxx", without 
> any information about the table name or path. 
> Since we have that in 1.6, this is kind of regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14476) Show table name or path in string of DataSourceScan

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14476:


Assignee: Cheng Lian  (was: Apache Spark)

> Show table name or path in string of DataSourceScan
> ---
>
> Key: SPARK-14476
> URL: https://issues.apache.org/jira/browse/SPARK-14476
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Cheng Lian
>Priority: Critical
>
> right now, the string of DataSourceScan is only "HadoopFiles xxx", without 
> any information about the table name or path. 
> Since we have that in 1.6, this is kind of regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14476) Show table name or path in string of DataSourceScan

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14476:


Assignee: Apache Spark  (was: Cheng Lian)

> Show table name or path in string of DataSourceScan
> ---
>
> Key: SPARK-14476
> URL: https://issues.apache.org/jira/browse/SPARK-14476
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>Priority: Critical
>
> right now, the string of DataSourceScan is only "HadoopFiles xxx", without 
> any information about the table name or path. 
> Since we have that in 1.6, this is kind of regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15085) Rename current streaming-kafka artifact to include kafka version

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273619#comment-15273619
 ] 

Apache Spark commented on SPARK-15085:
--

User 'koeninger' has created a pull request for this issue:
https://github.com/apache/spark/pull/12946

> Rename current streaming-kafka artifact to include kafka version
> 
>
> Key: SPARK-15085
> URL: https://issues.apache.org/jira/browse/SPARK-15085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Cody Koeninger
>
> Since supporting kafka 0.10 will likely need a separate artifact, rename 
> existing artifact now so that the minor breaking change is in place for spark 
> 2.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15085) Rename current streaming-kafka artifact to include kafka version

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15085:


Assignee: (was: Apache Spark)

> Rename current streaming-kafka artifact to include kafka version
> 
>
> Key: SPARK-15085
> URL: https://issues.apache.org/jira/browse/SPARK-15085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Cody Koeninger
>
> Since supporting kafka 0.10 will likely need a separate artifact, rename 
> existing artifact now so that the minor breaking change is in place for spark 
> 2.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15085) Rename current streaming-kafka artifact to include kafka version

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15085:


Assignee: Apache Spark

> Rename current streaming-kafka artifact to include kafka version
> 
>
> Key: SPARK-15085
> URL: https://issues.apache.org/jira/browse/SPARK-15085
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Cody Koeninger
>Assignee: Apache Spark
>
> Since supporting kafka 0.10 will likely need a separate artifact, rename 
> existing artifact now so that the minor breaking change is in place for spark 
> 2.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14809) R Examples: Check for new R APIs requiring example code in 2.0

2016-05-05 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14809:
--
Assignee: Yanbo Liang

> R Examples: Check for new R APIs requiring example code in 2.0
> --
>
> Key: SPARK-14809
> URL: https://issues.apache.org/jira/browse/SPARK-14809
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SparkR
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Minor
>
> Audit list of new features added to MLlib's R API, and see which major items 
> are missing example code (in the examples folder).  We do not need examples 
> for everything, only for major items such as new algorithms.
> For any such items:
> * Create a JIRA for that feature, and assign it to the author of the feature 
> (or yourself if interested).
> * Link it to (a) the original JIRA which introduced that feature ("related 
> to") and (b) to this JIRA ("requires").
> Note: This no longer includes Scala/Java/Python since those are covered under 
> the user guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15155) Optionally ignore default role resources

2016-05-05 Thread Chris Heller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Heller updated SPARK-15155:
-
Description: 
SPARK-6284 added support for Mesos roles, but the framework will still accept 
resources from both the reserved role specified in {{spark.mesos.role}} and the 
default role {{*}}.

I'd like to propose the addition of a new boolean property: 
{{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark 
will only accept resources from the role passed in the {{spark.mesos.role}} 
property. If {{spark.mesos.role}} has not been set, 
{{spark.mesos.ignoreDefaultRoleResources}} has no effect.

  was:
SPARK-6284 added support for Mesos roles, but the framework will still accept 
resources from both the reserved role specified in {{spark.mesos.role}} and the 
default role {{*}}.

I'd like to propose the addition of a new property 
{{spark.mesos.acceptedResourceRoles}} which would be a comma-delimited list of 
roles that the framework will accept resources from.

This is similar to {{spark.mesos.constraints}}, except that constraints look at 
the attributes of an offer, and this will look at the role of a resource.

In the default case {{spark.mesos.acceptedResourceRoles}} will be set to 
{{*[,spark.mesos.role]}} giving the exact same behavior to the framework if no 
value is specified in the property.


> Optionally ignore default role resources
> 
>
> Key: SPARK-15155
> URL: https://issues.apache.org/jira/browse/SPARK-15155
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chris Heller
>
> SPARK-6284 added support for Mesos roles, but the framework will still accept 
> resources from both the reserved role specified in {{spark.mesos.role}} and 
> the default role {{*}}.
> I'd like to propose the addition of a new boolean property: 
> {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark 
> will only accept resources from the role passed in the {{spark.mesos.role}} 
> property. If {{spark.mesos.role}} has not been set, 
> {{spark.mesos.ignoreDefaultRoleResources}} has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15155) Optionally ignore default role resources

2016-05-05 Thread Chris Heller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Heller updated SPARK-15155:
-
Summary: Optionally ignore default role resources  (was: Selectively accept 
Mesos resources by role)

> Optionally ignore default role resources
> 
>
> Key: SPARK-15155
> URL: https://issues.apache.org/jira/browse/SPARK-15155
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chris Heller
>
> SPARK-6284 added support for Mesos roles, but the framework will still accept 
> resources from both the reserved role specified in {{spark.mesos.role}} and 
> the default role {{*}}.
> I'd like to propose the addition of a new property 
> {{spark.mesos.acceptedResourceRoles}} which would be a comma-delimited list 
> of roles that the framework will accept resources from.
> This is similar to {{spark.mesos.constraints}}, except that constraints look 
> at the attributes of an offer, and this will look at the role of a resource.
> In the default case {{spark.mesos.acceptedResourceRoles}} will be set to 
> {{*[,spark.mesos.role]}} giving the exact same behavior to the framework if 
> no value is specified in the property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15171) Deprecate registerTempTable and add dataset.createTempView

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273544#comment-15273544
 ] 

Apache Spark commented on SPARK-15171:
--

User 'clockfly' has created a pull request for this issue:
https://github.com/apache/spark/pull/12945

> Deprecate registerTempTable and add dataset.createTempView
> --
>
> Key: SPARK-15171
> URL: https://issues.apache.org/jira/browse/SPARK-15171
> Project: Spark
>  Issue Type: Bug
>Reporter: Sean Zhong
>Priority: Minor
>
> Our current dataset.registerTempTable does not actually materialize data. So, 
> it should be considered as creating a temp view. We can deprecate it and 
> create a new method called dataset.createTempView(replaceIfExists: Boolean). 
> The default value of replaceIfExists should be false. For registerTempTable, 
> it will call dataset.createTempView(replaceIfExists = true).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15171) Deprecate registerTempTable and add dataset.createTempView

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15171:


Assignee: Apache Spark

> Deprecate registerTempTable and add dataset.createTempView
> --
>
> Key: SPARK-15171
> URL: https://issues.apache.org/jira/browse/SPARK-15171
> Project: Spark
>  Issue Type: Bug
>Reporter: Sean Zhong
>Assignee: Apache Spark
>Priority: Minor
>
> Our current dataset.registerTempTable does not actually materialize data. So, 
> it should be considered as creating a temp view. We can deprecate it and 
> create a new method called dataset.createTempView(replaceIfExists: Boolean). 
> The default value of replaceIfExists should be false. For registerTempTable, 
> it will call dataset.createTempView(replaceIfExists = true).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15171) Deprecate registerTempTable and add dataset.createTempView

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15171:


Assignee: (was: Apache Spark)

> Deprecate registerTempTable and add dataset.createTempView
> --
>
> Key: SPARK-15171
> URL: https://issues.apache.org/jira/browse/SPARK-15171
> Project: Spark
>  Issue Type: Bug
>Reporter: Sean Zhong
>Priority: Minor
>
> Our current dataset.registerTempTable does not actually materialize data. So, 
> it should be considered as creating a temp view. We can deprecate it and 
> create a new method called dataset.createTempView(replaceIfExists: Boolean). 
> The default value of replaceIfExists should be false. For registerTempTable, 
> it will call dataset.createTempView(replaceIfExists = true).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8428) TimSort Comparison method violates its general contract with CLUSTER BY

2016-05-05 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273534#comment-15273534
 ] 

Yi Zhou commented on SPARK-8428:


We found the similar issue with Spark 1.6.1 in our larger data size test..I 
posted the details like below. Then we try to increase the 
spark.sql.shuffle.partitions to resolve it. 

{code}
CREATE TABLE q26_spark_sql_run_query_0_temp (
  cid  BIGINT,
  id1  double,
  id2  double,
  id3  double,
  id4  double,
  id5  double,
  id6  double,
  id7  double,
  id8  double,
  id9  double,
  id10 double,
  id11 double,
  id12 double,
  id13 double,
  id14 double,
  id15 double
)

INSERT INTO TABLE q26_spark_sql_run_query_0_temp
SELECT
  ss.ss_customer_sk AS cid,
  count(CASE WHEN i.i_class_id=1  THEN 1 ELSE NULL END) AS id1,
  count(CASE WHEN i.i_class_id=2  THEN 1 ELSE NULL END) AS id2,
  count(CASE WHEN i.i_class_id=3  THEN 1 ELSE NULL END) AS id3,
  count(CASE WHEN i.i_class_id=4  THEN 1 ELSE NULL END) AS id4,
  count(CASE WHEN i.i_class_id=5  THEN 1 ELSE NULL END) AS id5,
  count(CASE WHEN i.i_class_id=6  THEN 1 ELSE NULL END) AS id6,
  count(CASE WHEN i.i_class_id=7  THEN 1 ELSE NULL END) AS id7,
  count(CASE WHEN i.i_class_id=8  THEN 1 ELSE NULL END) AS id8,
  count(CASE WHEN i.i_class_id=9  THEN 1 ELSE NULL END) AS id9,
  count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10,
  count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11,
  count(CASE WHEN i.i_class_id=12 THEN 1 ELSE NULL END) AS id12,
  count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13,
  count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14,
  count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15
FROM store_sales ss
INNER JOIN item i
  ON (ss.ss_item_sk = i.i_item_sk
  AND i.i_category IN ('Books')
  AND ss.ss_customer_sk IS NOT NULL
)
GROUP BY ss.ss_customer_sk
HAVING count(ss.ss_item_sk) > 5
ORDER BY cid
{code}

{code}
16/05/05 14:50:03 WARN scheduler.TaskSetManager: Lost task 12.0 in stage 162.0 
(TID 15153, node6): java.lang.IllegalArgumentException: Comparison method 
violates its
general contract!
at 
org.apache.spark.util.collection.TimSort$SortState.mergeLo(TimSort.java:794)
at 
org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:525)
at 
org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:453)
at 
org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:228)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:186)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:175)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249)
at 
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:83)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:295)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:330)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.

[jira] [Commented] (SPARK-15032) When we create a new JDBC session, we may need to create a new session of executionHive

2016-05-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273529#comment-15273529
 ] 

Yin Huai commented on SPARK-15032:
--

Can you explain more about "I think the problem is that it terminates the 
executionHive process"? I am not sure I understand this. Thanks!

> When we create a new JDBC session, we may need to create a new session of 
> executionHive
> ---
>
> Key: SPARK-15032
> URL: https://issues.apache.org/jira/browse/SPARK-15032
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Priority: Critical
>
> Right now, we only use executionHive in thriftserver. When we create a new 
> jdbc session, we probably need to create a new session of executionHive. I am 
> not sure what will break if we leave the code as is. But, I feel it will be 
> safer to create a new session of executionHive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15171) Deprecate registerTempTable and add dataset.createTempView

2016-05-05 Thread Sean Zhong (JIRA)
Sean Zhong created SPARK-15171:
--

 Summary: Deprecate registerTempTable and add dataset.createTempView
 Key: SPARK-15171
 URL: https://issues.apache.org/jira/browse/SPARK-15171
 Project: Spark
  Issue Type: Bug
Reporter: Sean Zhong
Priority: Minor


Our current dataset.registerTempTable does not actually materialize data. So, 
it should be considered as creating a temp view. We can deprecate it and create 
a new method called dataset.createTempView(replaceIfExists: Boolean). The 
default value of replaceIfExists should be false. For registerTempTable, it 
will call dataset.createTempView(replaceIfExists = true).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14809) R Examples: Check for new R APIs requiring example code in 2.0

2016-05-05 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273492#comment-15273492
 ] 

Yanbo Liang commented on SPARK-14809:
-

I'm glad to help this.

> R Examples: Check for new R APIs requiring example code in 2.0
> --
>
> Key: SPARK-14809
> URL: https://issues.apache.org/jira/browse/SPARK-14809
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SparkR
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Audit list of new features added to MLlib's R API, and see which major items 
> are missing example code (in the examples folder).  We do not need examples 
> for everything, only for major items such as new algorithms.
> For any such items:
> * Create a JIRA for that feature, and assign it to the author of the feature 
> (or yourself if interested).
> * Link it to (a) the original JIRA which introduced that feature ("related 
> to") and (b) to this JIRA ("requires").
> Note: This no longer includes Scala/Java/Python since those are covered under 
> the user guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11395) Support over and window specification in SparkR

2016-05-05 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-11395.
---
   Resolution: Fixed
 Assignee: Sun Rui
Fix Version/s: 2.0.0

Resolved by https://github.com/apache/spark/pull/10094

> Support over and window specification in SparkR
> ---
>
> Key: SPARK-11395
> URL: https://issues.apache.org/jira/browse/SPARK-11395
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Sun Rui
>Assignee: Sun Rui
> Fix For: 2.0.0
>
>
> 1. implement over() in Column class.
> 2. support window spec 
> (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.expressions.WindowSpec)
> 3. support utility functions for defining window in DataFrames. 
> (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.expressions.Window)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10043) Add window functions into SparkR

2016-05-05 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273475#comment-15273475
 ] 

Shivaram Venkataraman commented on SPARK-10043:
---

[~sunrui] Can we resolve this issue now ?

> Add window functions into SparkR
> 
>
> Key: SPARK-10043
> URL: https://issues.apache.org/jira/browse/SPARK-10043
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Yu Ishikawa
>
> Add window functions as follows in SparkR. I think we should improve 
> {{collect}} function in SparkR.
> - lead
> - cumuDist
> - denseRank
> - lag
> - ntile
> - percentRank
> - rank
> - rowNumber



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15170) Log error message in ExecutorAllocationManager

2016-05-05 Thread meiyoula (JIRA)
meiyoula created SPARK-15170:


 Summary: Log error message in ExecutorAllocationManager
 Key: SPARK-15170
 URL: https://issues.apache.org/jira/browse/SPARK-15170
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: meiyoula


No matter how long the expire idle time of executor, the log just says "it has 
been idle for $executorIdleTimeoutS seconds". Because executorIdleTimeoutS = 
conf.getTimeAsSeconds("spark.dynamicAllocation.executorIdleTimeout", "60s"), so 
it logs same expire time for different executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15074) Spark shuffle service bottlenecked while fetching large amount of intermediate data

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273416#comment-15273416
 ] 

Apache Spark commented on SPARK-15074:
--

User 'sitalkedia' has created a pull request for this issue:
https://github.com/apache/spark/pull/12944

> Spark shuffle service bottlenecked while fetching large amount of 
> intermediate data
> ---
>
> Key: SPARK-15074
> URL: https://issues.apache.org/jira/browse/SPARK-15074
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.6.1
>Reporter: Sital Kedia
>
> While running a job which produces more than 90TB of intermediate data, we 
> find that about 10-15% of the reducer execution time is being spent in 
> shuffle fetch. 
> Jstack of the shuffle service reveals that most of the time the shuffle 
> service is reading the index files generated by the mapper. 
> {code}
> java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.DataInputStream.readFully(DataInputStream.java:195)
>   at java.io.DataInputStream.readLong(DataInputStream.java:416)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:277)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:190)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The issue is that for each shuffle fetch, we reopen the same index file again 
> and read it. It would be much efficient, if we can avoid opening the same 
> file multiple times and cache the data. We can use an LRU cache to save the 
> index file information. This way we can also limit the number of entries in 
> the cache so that we don't blow up the memory indefinitely. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---

[jira] [Assigned] (SPARK-15074) Spark shuffle service bottlenecked while fetching large amount of intermediate data

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15074:


Assignee: Apache Spark

> Spark shuffle service bottlenecked while fetching large amount of 
> intermediate data
> ---
>
> Key: SPARK-15074
> URL: https://issues.apache.org/jira/browse/SPARK-15074
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.6.1
>Reporter: Sital Kedia
>Assignee: Apache Spark
>
> While running a job which produces more than 90TB of intermediate data, we 
> find that about 10-15% of the reducer execution time is being spent in 
> shuffle fetch. 
> Jstack of the shuffle service reveals that most of the time the shuffle 
> service is reading the index files generated by the mapper. 
> {code}
> java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.DataInputStream.readFully(DataInputStream.java:195)
>   at java.io.DataInputStream.readLong(DataInputStream.java:416)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:277)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:190)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The issue is that for each shuffle fetch, we reopen the same index file again 
> and read it. It would be much efficient, if we can avoid opening the same 
> file multiple times and cache the data. We can use an LRU cache to save the 
> index file information. This way we can also limit the number of entries in 
> the cache so that we don't blow up the memory indefinitely. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional co

[jira] [Assigned] (SPARK-15074) Spark shuffle service bottlenecked while fetching large amount of intermediate data

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15074:


Assignee: (was: Apache Spark)

> Spark shuffle service bottlenecked while fetching large amount of 
> intermediate data
> ---
>
> Key: SPARK-15074
> URL: https://issues.apache.org/jira/browse/SPARK-15074
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.6.1
>Reporter: Sital Kedia
>
> While running a job which produces more than 90TB of intermediate data, we 
> find that about 10-15% of the reducer execution time is being spent in 
> shuffle fetch. 
> Jstack of the shuffle service reveals that most of the time the shuffle 
> service is reading the index files generated by the mapper. 
> {code}
> java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.DataInputStream.readFully(DataInputStream.java:195)
>   at java.io.DataInputStream.readLong(DataInputStream.java:416)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:277)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:190)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The issue is that for each shuffle fetch, we reopen the same index file again 
> and read it. It would be much efficient, if we can avoid opening the same 
> file multiple times and cache the data. We can use an LRU cache to save the 
> index file information. This way we can also limit the number of entries in 
> the cache so that we don't blow up the memory indefinitely. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h.

[jira] [Commented] (SPARK-14963) YarnShuffleService should use YARN getRecoveryPath() for leveldb location

2016-05-05 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273412#comment-15273412
 ] 

Saisai Shao commented on SPARK-14963:
-

OK, I will do it.

> YarnShuffleService should use YARN getRecoveryPath() for leveldb location
> -
>
> Key: SPARK-14963
> URL: https://issues.apache.org/jira/browse/SPARK-14963
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 1.6.1
>Reporter: Thomas Graves
>
> The YarnShuffleService, currently just picks a directly in the yarn local 
> dirs to store the leveldb file.  YARN added an interface in hadoop 2.5 
> getRecoverPath() to get the location where it should be storing this.
> We should change to use getRecoveryPath(). This does mean we will have to use 
> reflection or similar to check for its existence though since it doesn't 
> exist before hadoop 2.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15159) Remove usage of HiveContext in SparkR.

2016-05-05 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273396#comment-15273396
 ] 

Shivaram Venkataraman commented on SPARK-15159:
---

Is it being removed or is it being deprecated in 2.0 - If its being removed 
then we need to make this a priority

> Remove usage of HiveContext in SparkR.
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> HiveContext is to be deprecated in 2.0.  Replace them with 
> SparkSession.withHiveSupport in SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273391#comment-15273391
 ] 

Apache Spark commented on SPARK-15168:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/12943

> Add missing params to Python's MultilayerPerceptronClassifier
> -
>
> Key: SPARK-15168
> URL: https://issues.apache.org/jira/browse/SPARK-15168
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> MultilayerPerceptronClassifier is missing step size, solver, and weights. Add 
> these params.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15168:


Assignee: (was: Apache Spark)

> Add missing params to Python's MultilayerPerceptronClassifier
> -
>
> Key: SPARK-15168
> URL: https://issues.apache.org/jira/browse/SPARK-15168
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> MultilayerPerceptronClassifier is missing step size, solver, and weights. Add 
> these params.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15168:


Assignee: Apache Spark

> Add missing params to Python's MultilayerPerceptronClassifier
> -
>
> Key: SPARK-15168
> URL: https://issues.apache.org/jira/browse/SPARK-15168
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> MultilayerPerceptronClassifier is missing step size, solver, and weights. Add 
> these params.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15159) Remove usage of HiveContext in SparkR.

2016-05-05 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273379#comment-15273379
 ] 

Felix Cheung commented on SPARK-15159:
--

With the updated goal, this seems to be a fairly big change, how do we want to 
proceed?


> Remove usage of HiveContext in SparkR.
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> HiveContext is to be deprecated in 2.0.  Replace them with 
> SparkSession.withHiveSupport in SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15169) Consider improving HasSolver to allow generilization

2016-05-05 Thread holdenk (JIRA)
holdenk created SPARK-15169:
---

 Summary: Consider improving HasSolver to allow generilization
 Key: SPARK-15169
 URL: https://issues.apache.org/jira/browse/SPARK-15169
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: holdenk
Priority: Trivial


The current HasSolver shared param has a fixed default value of "auto" and no 
validation. Some algorithms (see `MultilayerPerceptronClassifier`) have 
different default values or validators. This results in either a mostly 
duplicated param (as in `MultilayerPerceptronClassifier`) or incorrect scaladoc 
(as in `GeneralizedLinearRegression`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13566) Deadlock between MemoryStore and BlockManager

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13566:
--
Assignee: cen yuhai

> Deadlock between MemoryStore and BlockManager
> -
>
> Key: SPARK-13566
> URL: https://issues.apache.org/jira/browse/SPARK-13566
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> ===
> "block-manager-slave-async-thread-pool-1":
> at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216)
> - waiting to lock <0x0005895b09b0> (a 
> org.apache.spark.memory.UnifiedMemoryManager)
> at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114)
> - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> at 
> org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "Executor task launch worker-10":
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032)
> - waiting to lock <0x00059a0988b8> (a 
> org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-05-05 Thread holdenk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk updated SPARK-15168:

Description: MultilayerPerceptronClassifier is missing step size, solver, 
and weights. Add these params.  (was: MultilayerPerceptronClassifier is missing 
Tol, solver, and weights. Add these params.)

> Add missing params to Python's MultilayerPerceptronClassifier
> -
>
> Key: SPARK-15168
> URL: https://issues.apache.org/jira/browse/SPARK-15168
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> MultilayerPerceptronClassifier is missing step size, solver, and weights. Add 
> these params.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15168) Add missing params to Python's MultilayerPerceptronClassifier

2016-05-05 Thread holdenk (JIRA)
holdenk created SPARK-15168:
---

 Summary: Add missing params to Python's 
MultilayerPerceptronClassifier
 Key: SPARK-15168
 URL: https://issues.apache.org/jira/browse/SPARK-15168
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: holdenk
Priority: Trivial


MultilayerPerceptronClassifier is missing Tol, solver, and weights. Add these 
params.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15159) Remove usage of HiveContext in SparkR.

2016-05-05 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated SPARK-15159:

Description: HiveContext is to be deprecated in 2.0.  Replace them with 
SparkSession.withHiveSupport in SparkR  (was: HiveContext is to be deprecated 
in 2.0. However, there are several times of usage of HiveContext in SparkR unit 
test cases. Replace them with SparkSession.withHiveSupport .)

> Remove usage of HiveContext in SparkR.
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> HiveContext is to be deprecated in 2.0.  Replace them with 
> SparkSession.withHiveSupport in SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15159) Remove usage of HiveContext in SparkR.

2016-05-05 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated SPARK-15159:

Summary: Remove usage of HiveContext in SparkR.  (was: Remove usage of 
HiveContext in SparkR unit test cases.)

> Remove usage of HiveContext in SparkR.
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> HiveContext is to be deprecated in 2.0. However, there are several times of 
> usage of HiveContext in SparkR unit test cases. Replace them with 
> SparkSession.withHiveSupport .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15167) Add public catalog implementation method to SparkSession

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15167:


Assignee: Andrew Or  (was: Apache Spark)

> Add public catalog implementation method to SparkSession
> 
>
> Key: SPARK-15167
> URL: https://issues.apache.org/jira/browse/SPARK-15167
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Right now there's no way to check whether a given SparkSession has Hive 
> support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but 
> that's supposed to be hidden from the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15167) Add public catalog implementation method to SparkSession

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15167:


Assignee: Apache Spark  (was: Andrew Or)

> Add public catalog implementation method to SparkSession
> 
>
> Key: SPARK-15167
> URL: https://issues.apache.org/jira/browse/SPARK-15167
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>
> Right now there's no way to check whether a given SparkSession has Hive 
> support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but 
> that's supposed to be hidden from the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15167) Add public catalog implementation method to SparkSession

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273368#comment-15273368
 ] 

Apache Spark commented on SPARK-15167:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/12942

> Add public catalog implementation method to SparkSession
> 
>
> Key: SPARK-15167
> URL: https://issues.apache.org/jira/browse/SPARK-15167
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Right now there's no way to check whether a given SparkSession has Hive 
> support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but 
> that's supposed to be hidden from the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15166:


Assignee: Andrew Or  (was: Apache Spark)

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273354#comment-15273354
 ] 

Apache Spark commented on SPARK-15166:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/12941

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15166:


Assignee: Apache Spark  (was: Andrew Or)

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-15165:
-
Priority: Critical  (was: Major)

> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>Priority: Critical
>
> toCommentSafeString method replaces "\u" with "\ \u" to avoid codegen 
> breaking.
> But if the even number of "\" is put before "u", like "\ \u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-15165:
-
Target Version/s: 2.0.0

> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>
> toCommentSafeString method replaces "\u" with "\ \u" to avoid codegen 
> breaking.
> But if the even number of "\" is put before "u", like "\ \u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15152) Scaladoc and Code style Improvements

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15152.
---
  Resolution: Fixed
Assignee: Jacek Laskowski
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Scaladoc and Code style Improvements
> 
>
> Key: SPARK-15152
> URL: https://issues.apache.org/jira/browse/SPARK-15152
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, Spark Core, SQL, YARN
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Jacek Laskowski
>Priority: Minor
> Fix For: 2.0.0
>
>
> While doing code reviews for the Spark Notes I found many places with typos 
> and incorrect code style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15167) Add public catalog implementation method to SparkSession

2016-05-05 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15167:
-

 Summary: Add public catalog implementation method to SparkSession
 Key: SPARK-15167
 URL: https://issues.apache.org/jira/browse/SPARK-15167
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


Right now there's no way to check whether a given SparkSession has Hive 
support. You can do `spark.conf.get("spark.sql.catalogImplementation")` but 
that's supposed to be hidden from the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15166:
--
Summary: Move hive-specific conf setting from SparkSession  (was: Move 
hive-specific conf setting to HiveSharedState)

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15166) Move hive-specific conf setting to HiveSharedState

2016-05-05 Thread Andrew Or (JIRA)
Andrew Or created SPARK-15166:
-

 Summary: Move hive-specific conf setting to HiveSharedState
 Key: SPARK-15166
 URL: https://issues.apache.org/jira/browse/SPARK-15166
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-15165:
---
Description: 
toCommentSafeString method replaces "\u" with "\ \u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\ \u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}


  was:
toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\\u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}



> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>
> toCommentSafeString method replaces "\u" with "\ \u" to avoid codegen 
> breaking.
> But if the even number of "\" is put before "u", like "\ \u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-15165:
---
Description: 
toCommentSafeString method replaces "\u" with "u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\\u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}


  was:
toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\\u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}



> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>
> toCommentSafeString method replaces "\u" with "u" to avoid codegen 
> breaking.
> But if the even number of "\" is put before "u", like "\\u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-15165:
---
Description: 
toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\\u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}


  was:
toCommentSafeString method replaces "\u" with "\\\u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\\u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}



> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>
> toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
> But if the even number of "\" is put before "u", like "\\u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-15165:
---
Description: 
toCommentSafeString method replaces "\u" with "\\\u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\\u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}


  was:
toCommentSafeString method replaces "\u" with "u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\\u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}



> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>
> toCommentSafeString method replaces "\u" with "\\\u" to avoid codegen 
> breaking.
> But if the even number of "\" is put before "u", like "\\u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15074) Spark shuffle service bottlenecked while fetching large amount of intermediate data

2016-05-05 Thread Sital Kedia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273241#comment-15273241
 ] 

Sital Kedia edited comment on SPARK-15074 at 5/5/16 10:32 PM:
--

Okay, I made a change to cache the index file and that made the shuffle read 
time twice as fast. I am going to put out a PR for that change soon.

Now I see the shuffle service is spending most of the time in the 
FileChannelImpl.transferTo method (Refer to the stack trace below). I wonder if 
there is a way to speed it up further?

{code}
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at 
sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:427)
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:492)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:607)
at 
org.apache.spark.network.buffer.LazyFileRegion.transferTo(LazyFileRegion.java:96)
at 
org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:89)
at 
io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:254)
at 
io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:237)
at 
io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:281)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:761)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:311)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:729)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1127)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663)
at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:644)
at 
io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663)
at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:644)
at 
io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663)
at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:693)
at 
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:681)
at 
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:716)
at 
io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:954)
at 
io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:244)
at 
org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:184)
at 
org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:129)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:100)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Abs

[jira] [Commented] (SPARK-15074) Spark shuffle service bottlenecked while fetching large amount of intermediate data

2016-05-05 Thread Sital Kedia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273241#comment-15273241
 ] 

Sital Kedia commented on SPARK-15074:
-

Okay, I made a change to cache the index file and that made the shuffle read 
time twice as fast. I am going to put out a PR for that change soon.

Now I see the shuffle service is spending most of the time in the 
FileChannelImpl.transferTo method (Refer to the stack trace below). I wonder if 
there is a way to speed it up further?


java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at 
sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:427)
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:492)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:607)
at 
org.apache.spark.network.buffer.LazyFileRegion.transferTo(LazyFileRegion.java:96)
at 
org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:89)
at 
io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:254)
at 
io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:237)
at 
io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:281)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:761)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:311)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:729)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1127)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663)
at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:644)
at 
io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663)
at 
io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:644)
at 
io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663)
at 
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:693)
at 
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:681)
at 
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:716)
at 
io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:954)
at 
io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:244)
at 
org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:184)
at 
org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:129)
at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:100)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at 
io.net

[jira] [Assigned] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15165:


Assignee: Apache Spark

> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>
> toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
> But if the even number of "\" is put before "u", like "\\u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273238#comment-15273238
 ] 

Apache Spark commented on SPARK-15165:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/12939

> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>
> toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
> But if the even number of "\" is put before "u", like "\\u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15165:


Assignee: (was: Apache Spark)

> Codegen can break because toCommentSafeString is not actually safe
> --
>
> Key: SPARK-15165
> URL: https://issues.apache.org/jira/browse/SPARK-15165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>
> toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
> But if the even number of "\" is put before "u", like "\\u", in the string 
> literal in the query, codegen can break.
> Following code occurs compilation error.
> {code}
> val df = Seq(...).toDF
> df.select("'u002A/'").show
> {code}
> The reason of the compilation error is because "u002A/" is translated 
> into "*/" (the end of comment). 
> Due to this unsafety, arbitrary code can be injected like as follows.
> {code}
> val df = Seq(...).toDF
> // Inject "System.exit(1)"
> df.select("'u002A/{System.exit(1);}/*'").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15165) Codegen can break because toCommentSafeString is not actually safe

2016-05-05 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created SPARK-15165:
--

 Summary: Codegen can break because toCommentSafeString is not 
actually safe
 Key: SPARK-15165
 URL: https://issues.apache.org/jira/browse/SPARK-15165
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Kousuke Saruta


toCommentSafeString method replaces "\u" with "\\u" to avoid codegen breaking.
But if the even number of "\" is put before "u", like "\\u", in the string 
literal in the query, codegen can break.

Following code occurs compilation error.

{code}
val df = Seq(...).toDF
df.select("'u002A/'").show
{code}

The reason of the compilation error is because "u002A/" is translated 
into "*/" (the end of comment). 

Due to this unsafety, arbitrary code can be injected like as follows.

{code}
val df = Seq(...).toDF
// Inject "System.exit(1)"
df.select("'u002A/{System.exit(1);}/*'").show
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14977) Fine grained mode in Mesos is not fair

2016-05-05 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt closed SPARK-14977.
---
Resolution: Not A Problem

> Fine grained mode in Mesos is not fair
> --
>
> Key: SPARK-14977
> URL: https://issues.apache.org/jira/browse/SPARK-14977
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.1.0
> Environment: Spark commit db75ccb, Debian jessie, Mesos fine grained
>Reporter: Luca Bruno
>
> I've setup a mesos cluster and I'm running spark in fine grained mode.
> Spark defaults to 2 executor cores and 2gb of ram.
> The total mesos cluster has 8 cores and 8gb of ram.
> When I submit two spark jobs simultaneously, spark will always accept full 
> resources, leading the two frameworks to use 4gb of ram each instead of 2gb.
> If I submit another spark job, it will not get offered resources from mesos, 
> at least using the default HierarchicalDRF allocator module.
> Mesos will keep offering 4gb of ram to earlier spark jobs, and spark keeps 
> accepting full resources for every new task.
> Hence new spark jobs have no chance of getting a share.
> Is this something to be solved with a custom mesos allocator? Or spark should 
> be more fair instead? Or maybe provide a configuration option to always 
> accept with the minimum resources?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14977) Fine grained mode in Mesos is not fair

2016-05-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273202#comment-15273202
 ] 

Michael Gummelt commented on SPARK-14977:
-

[~lethalman]: Fine-grained mode only release cores, not memory.  It's 
impossible for us to shrink the memory allocation without OOM-ing the executor, 
because the JVM doesn't relinquish memory back to the OS.

You can use dynamic allocation to terminate entire executors as they become 
idle.

Also, FYI, fine-grained mode will soon be deprecated in favor of dynamic 
allocation.

> Fine grained mode in Mesos is not fair
> --
>
> Key: SPARK-14977
> URL: https://issues.apache.org/jira/browse/SPARK-14977
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.1.0
> Environment: Spark commit db75ccb, Debian jessie, Mesos fine grained
>Reporter: Luca Bruno
>
> I've setup a mesos cluster and I'm running spark in fine grained mode.
> Spark defaults to 2 executor cores and 2gb of ram.
> The total mesos cluster has 8 cores and 8gb of ram.
> When I submit two spark jobs simultaneously, spark will always accept full 
> resources, leading the two frameworks to use 4gb of ram each instead of 2gb.
> If I submit another spark job, it will not get offered resources from mesos, 
> at least using the default HierarchicalDRF allocator module.
> Mesos will keep offering 4gb of ram to earlier spark jobs, and spark keeps 
> accepting full resources for every new task.
> Hence new spark jobs have no chance of getting a share.
> Is this something to be solved with a custom mesos allocator? Or spark should 
> be more fair instead? Or maybe provide a configuration option to always 
> accept with the minimum resources?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15162) Update PySpark LogisticRegression threshold PyDoc to be as complete as Scaladoc

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15162:


Assignee: Apache Spark

> Update PySpark LogisticRegression threshold PyDoc to be as complete as 
> Scaladoc
> ---
>
> Key: SPARK-15162
> URL: https://issues.apache.org/jira/browse/SPARK-15162
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> The PyDoc for setting and getting the threshold in logistic regression 
> doesn't have the same level of detail as the Scaladoc does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15162) Update PySpark LogisticRegression threshold PyDoc to be as complete as Scaladoc

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273180#comment-15273180
 ] 

Apache Spark commented on SPARK-15162:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/12938

> Update PySpark LogisticRegression threshold PyDoc to be as complete as 
> Scaladoc
> ---
>
> Key: SPARK-15162
> URL: https://issues.apache.org/jira/browse/SPARK-15162
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Trivial
>
> The PyDoc for setting and getting the threshold in logistic regression 
> doesn't have the same level of detail as the Scaladoc does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15164) Mark classification algorithms as experimental where marked so in scala

2016-05-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273181#comment-15273181
 ] 

Apache Spark commented on SPARK-15164:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/12938

> Mark classification algorithms as experimental where marked so in scala
> ---
>
> Key: SPARK-15164
> URL: https://issues.apache.org/jira/browse/SPARK-15164
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15164) Mark classification algorithms as experimental where marked so in scala

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15164:


Assignee: Apache Spark

> Mark classification algorithms as experimental where marked so in scala
> ---
>
> Key: SPARK-15164
> URL: https://issues.apache.org/jira/browse/SPARK-15164
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15162) Update PySpark LogisticRegression threshold PyDoc to be as complete as Scaladoc

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15162:


Assignee: (was: Apache Spark)

> Update PySpark LogisticRegression threshold PyDoc to be as complete as 
> Scaladoc
> ---
>
> Key: SPARK-15162
> URL: https://issues.apache.org/jira/browse/SPARK-15162
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Trivial
>
> The PyDoc for setting and getting the threshold in logistic regression 
> doesn't have the same level of detail as the Scaladoc does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15164) Mark classification algorithms as experimental where marked so in scala

2016-05-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15164:


Assignee: (was: Apache Spark)

> Mark classification algorithms as experimental where marked so in scala
> ---
>
> Key: SPARK-15164
> URL: https://issues.apache.org/jira/browse/SPARK-15164
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14893) Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14893.
---
   Resolution: Fixed
 Assignee: Dilip Biswal
Fix Version/s: 2.0.0

> Re-enable HiveSparkSubmitSuite SPARK-8489 test after HiveContext is removed
> ---
>
> Key: SPARK-14893
> URL: https://issues.apache.org/jira/browse/SPARK-14893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Dilip Biswal
> Fix For: 2.0.0
>
>
> The test was disabled in https://github.com/apache/spark/pull/12585. To 
> re-enable it we need to rebuild the jar using the updated source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14812) ML, Graph 2.0 QA: API: Experimental, DeveloperApi, final, sealed audit

2016-05-05 Thread DB Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai reassigned SPARK-14812:
---

Assignee: DB Tsai

> ML, Graph 2.0 QA: API: Experimental, DeveloperApi, final, sealed audit
> --
>
> Key: SPARK-14812
> URL: https://issues.apache.org/jira/browse/SPARK-14812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: DB Tsai
>
> We should make a pass through the items marked as Experimental or 
> DeveloperApi and see if any are stable enough to be unmarked.
> We should also check for items marked final or sealed to see if they are 
> stable enough to be opened up as APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15158) Too aggressive logging in SizeBasedRollingPolicy?

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15158:
--
Assignee: Kai Wang

> Too aggressive logging in SizeBasedRollingPolicy?
> -
>
> Key: SPARK-15158
> URL: https://issues.apache.org/jira/browse/SPARK-15158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Kai Wang
>Assignee: Kai Wang
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The questionable line is this: 
> https://github.com/apache/spark/blob/3e27940a19e7bab448f1af11d2065ecd1ec66197/core/src/main/scala/org/apache/spark/util/logging/RollingPolicy.scala#L116
> This will output a message *whenever* anything is logged at executor level. 
> Like the following:
> SizeBasedRollingPolicy:59 83 + 140796 > 1048576
> SizeBasedRollingPolicy:59 83 + 140879 > 1048576
> SizeBasedRollingPolicy:59 83 + 140962 > 1048576
> ...
> This seems to aggressive. Should this be at least downgrade to debug level?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9926) Parallelize file listing for partitioned Hive table

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-9926.
--
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Parallelize file listing for partitioned Hive table
> ---
>
> Key: SPARK-9926
> URL: https://issues.apache.org/jira/browse/SPARK-9926
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Cheolsoo Park
>Assignee: Ryan Blue
> Fix For: 2.0.0
>
>
> In Spark SQL, short queries like {{select * from table limit 10}} run very 
> slowly against partitioned Hive tables because of file listing. In 
> particular, if a large number of partitions are scanned on storage like S3, 
> the queries run extremely slowly. Here are some example benchmarks in my 
> environment-
> * Parquet-backed Hive table
> * Partitioned by dateint and hour
> * Stored on S3
> ||\# of partitions||\# of files||runtime||query||
> |1|972|30 secs|select * from nccp_log where dateint=20150601 and hour=0 limit 
> 10;|
> |24|13646|6 mins|select * from nccp_log where dateint=20150601 limit 10;|
> |240|136222|1 hour|select * from nccp_log where dateint>=20150601 and 
> dateint<=20150610 limit 10;|
> The problem is that {{TableReader}} constructs a separate HadoopRDD per Hive 
> partition path and group them into a UnionRDD. Then, all the input files are 
> listed sequentially. In other tools such as Hive and Pig, this can be solved 
> by setting 
> [mapreduce.input.fileinputformat.list-status.num-threads|https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml]
>  high. But in Spark, since each HadoopRDD lists only one partition path, 
> setting this property doesn't help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15158) Too aggressive logging in SizeBasedRollingPolicy?

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15158.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Too aggressive logging in SizeBasedRollingPolicy?
> -
>
> Key: SPARK-15158
> URL: https://issues.apache.org/jira/browse/SPARK-15158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Kai Wang
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The questionable line is this: 
> https://github.com/apache/spark/blob/3e27940a19e7bab448f1af11d2065ecd1ec66197/core/src/main/scala/org/apache/spark/util/logging/RollingPolicy.scala#L116
> This will output a message *whenever* anything is logged at executor level. 
> Like the following:
> SizeBasedRollingPolicy:59 83 + 140796 > 1048576
> SizeBasedRollingPolicy:59 83 + 140879 > 1048576
> SizeBasedRollingPolicy:59 83 + 140962 > 1048576
> ...
> This seems to aggressive. Should this be at least downgrade to debug level?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15134) Indent SparkSession builder patterns and update binary_classification_metrics_example.py

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15134.
---
  Resolution: Fixed
Assignee: Dongjoon Hyun
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Indent SparkSession builder patterns and update 
> binary_classification_metrics_example.py
> 
>
> Key: SPARK-15134
> URL: https://issues.apache.org/jira/browse/SPARK-15134
> Project: Spark
>  Issue Type: Task
>  Components: Examples
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.0.0
>
>
> This issue addresses the comments in SPARK-15031 and also fix java-linter 
> errors.
> - Use multiline format in SparkSession builder patterns.
> - Update `binary_classification_metrics_example.py` to use `SparkSession`.
> - Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15135) Make sure SparkSession thread safe

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15135.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Make sure SparkSession thread safe
> --
>
> Key: SPARK-15135
> URL: https://issues.apache.org/jira/browse/SPARK-15135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> Fixed non-thread-safe classed used by SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15072) Remove SparkSession.withHiveSupport

2016-05-05 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15072.
---
Resolution: Fixed

> Remove SparkSession.withHiveSupport
> ---
>
> Key: SPARK-15072
> URL: https://issues.apache.org/jira/browse/SPARK-15072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10653) Remove unnecessary things from SparkEnv

2016-05-05 Thread Alex Bozarth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273117#comment-15273117
 ] 

Alex Bozarth commented on SPARK-10653:
--

I'm currently running tests on a fix for this and will open a PR after. I have 
removed blockTransferService and sparkFilesDir and replaced the few references 
to them. ExecutorMemoryManager was already removed in SPARK-10984. I also took 
a quick look at the other vals in the constructor and I didn't see any other 
low hanging fruit to remove.

> Remove unnecessary things from SparkEnv
> ---
>
> Key: SPARK-10653
> URL: https://issues.apache.org/jira/browse/SPARK-10653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> As of the writing of this message, there are at least two things that can be 
> removed from it:
> {code}
> @DeveloperApi
> class SparkEnv (
> val executorId: String,
> private[spark] val rpcEnv: RpcEnv,
> val serializer: Serializer,
> val closureSerializer: Serializer,
> val cacheManager: CacheManager,
> val mapOutputTracker: MapOutputTracker,
> val shuffleManager: ShuffleManager,
> val broadcastManager: BroadcastManager,
> val blockTransferService: BlockTransferService, // this one can go
> val blockManager: BlockManager,
> val securityManager: SecurityManager,
> val httpFileServer: HttpFileServer,
> val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
> val metricsSystem: MetricsSystem,
> val shuffleMemoryManager: ShuffleMemoryManager,
> val executorMemoryManager: ExecutorMemoryManager, // this can go
> val outputCommitCoordinator: OutputCommitCoordinator,
> val conf: SparkConf) extends Logging {
>   ...
> }
> {code}
> We should avoid adding to this infinite list of things in SparkEnv's 
> constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15140) ensure input object of encoder is not null

2016-05-05 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273100#comment-15273100
 ] 

Michael Armbrust commented on SPARK-15140:
--

The 2.0 behavior seems correct.  Ideally .toDS().collect() will always 
round-trip the data without change.

> ensure input object of encoder is not null
> --
>
> Key: SPARK-15140
> URL: https://issues.apache.org/jira/browse/SPARK-15140
> Project: Spark
>  Issue Type: Improvement
>Reporter: Wenchen Fan
>
> Current we assume the input object for encoder won't be null, but we don't 
> check it. For example, in 1.6 `Seq("a", null).toDS.collect` will throw NPE, 
> in 2.0 this will return Array("a", null).
> We should define this behaviour more clearly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >