[jira] [Updated] (SPARK-14552) ReValue wrapper for SparkR

2016-04-11 Thread Alok Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alok Singh updated SPARK-14552:
---
Description: 
Implement the wrapper for VectorIndexer.

The inspiring idea is from dply package in R

x <- c("a", "b", "c")
revalue(x, c(a = "1", c = "2"))



  was:
Implement the wrapper for VectorIndexer.

In R in the dply package one can do the following 


x <- c("a", "b", "c")
revalue(x, c(a = "1", c = "2"))




> ReValue wrapper for SparkR
> --
>
> Key: SPARK-14552
> URL: https://issues.apache.org/jira/browse/SPARK-14552
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Alok Singh
>
> Implement the wrapper for VectorIndexer.
> The inspiring idea is from dply package in R
> x <- c("a", "b", "c")
> revalue(x, c(a = "1", c = "2"))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14557) CTAS (save as textfile) doesn't work with pathFilter defined

2016-04-11 Thread Kashish Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kashish Jain updated SPARK-14557:
-
Remaining Estimate: (was: 168h)
 Original Estimate: (was: 168h)

> CTAS (save as textfile) doesn't work with pathFilter defined
> 
>
> Key: SPARK-14557
> URL: https://issues.apache.org/jira/browse/SPARK-14557
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.3.2, 1.5.2
>Reporter: Kashish Jain
>
> When the pathFilter is enabled in hive-site.xml, the queries fail on the 
> table created through CTAS.
> Query fired for creating the table
> create table CTAS1(field1 int, field2 int) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',' STORED AS TEXTFILE as select field1, field2 from  
> limit 5
> Query which fails -  Select * from 
> Exception Observed - 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal 
> character in scheme name at index 10: part-0,hdfs:
> at org.apache.hadoop.fs.Path.initialize(Path.java:206)
> at org.apache.hadoop.fs.Path.(Path.java:172)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14557) CTAS (save as textfile) doesn't work with pathFilter enabled

2016-04-11 Thread Kashish Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kashish Jain updated SPARK-14557:
-
Summary: CTAS (save as textfile) doesn't work with pathFilter enabled  
(was: CTAS (save as textfile) doesn't work with pathFilter defined)

> CTAS (save as textfile) doesn't work with pathFilter enabled
> 
>
> Key: SPARK-14557
> URL: https://issues.apache.org/jira/browse/SPARK-14557
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.3.2, 1.5.2
>Reporter: Kashish Jain
>
> When the pathFilter is enabled in hive-site.xml, the queries fail on the 
> table created through CTAS.
> Query fired for creating the table
> create table CTAS1(field1 int, field2 int) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',' STORED AS TEXTFILE as select field1, field2 from  
> limit 5
> Query which fails -  Select * from 
> Exception Observed - 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal 
> character in scheme name at index 10: part-0,hdfs:
> at org.apache.hadoop.fs.Path.initialize(Path.java:206)
> at org.apache.hadoop.fs.Path.(Path.java:172)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14557) CTAS (save as textfile) doesn't work with pathFilter defined

2016-04-11 Thread Kashish Jain (JIRA)
Kashish Jain created SPARK-14557:


 Summary: CTAS (save as textfile) doesn't work with pathFilter 
defined
 Key: SPARK-14557
 URL: https://issues.apache.org/jira/browse/SPARK-14557
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.2, 1.3.1, 1.3.2
Reporter: Kashish Jain


When the pathFilter is enabled in hive-site.xml, the queries fail on the table 
created through CTAS.
Query fired for creating the table
create table CTAS1(field1 int, field2 int) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ',' STORED AS TEXTFILE as select field1, field2 from  
limit 5

Query which fails -  Select * from 

Exception Observed - 
java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal 
character in scheme name at index 10: part-0,hdfs:
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.(Path.java:172)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14535) Remove buildInternalScan from FileFormat

2016-04-11 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-14535.
--
Resolution: Fixed

Issue resolved by pull request 12300
[https://github.com/apache/spark/pull/12300]

> Remove buildInternalScan from FileFormat
> 
>
> Key: SPARK-14535
> URL: https://issues.apache.org/jira/browse/SPARK-14535
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14554) disable whole stage codegen if there are too many input columns

2016-04-11 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-14554.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12322
[https://github.com/apache/spark/pull/12322]

> disable whole stage codegen if there are too many input columns
> ---
>
> Key: SPARK-14554
> URL: https://issues.apache.org/jira/browse/SPARK-14554
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Critical
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12922) Implement gapply() on DataFrame in SparkR

2016-04-11 Thread Narine Kokhlikyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236638#comment-15236638
 ] 

Narine Kokhlikyan commented on SPARK-12922:
---

[~sunrui], Thank you very much for the explanation!
Now I got it!

> Implement gapply() on DataFrame in SparkR
> -
>
> Key: SPARK-12922
> URL: https://issues.apache.org/jira/browse/SPARK-12922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.0
>Reporter: Sun Rui
>
> gapply() applies an R function on groups grouped by one or more columns of a 
> DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() 
> in the Dataset API.
> Two API styles are supported:
> 1.
> {code}
> gd <- groupBy(df, col1, ...)
> gapply(gd, function(grouping_key, group) {}, schema)
> {code}
> 2.
> {code}
> gapply(df, grouping_columns, function(grouping_key, group) {}, schema) 
> {code}
> R function input: grouping keys value, a local data.frame of this grouped 
> data 
> R function output: local data.frame
> Schema specifies the Row format of the output of the R function. It must 
> match the R function's output.
> Note that map-side combination (partial aggregation) is not supported, user 
> could do map-side combination via dapply().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12922) Implement gapply() on DataFrame in SparkR

2016-04-11 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236622#comment-15236622
 ] 

Sun Rui commented on SPARK-12922:
-

[~Narine] DataFrame and Dataset are now converged. DataFrame is a different 
view of Dataset, that is Dataset. So groupByKey is the same method for 
both Dataset and DataFrame, but the `func` is different as the data element 
view is different, for example:
{code}
val ds = Seq((1,2), (3,4)).toDS
val gd = ds.groupByKey(v=>v._1)
val df = ds.toDF
val gd1 = df.groupByKey(r=>r.getInt(0))
{code}


> Implement gapply() on DataFrame in SparkR
> -
>
> Key: SPARK-12922
> URL: https://issues.apache.org/jira/browse/SPARK-12922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.0
>Reporter: Sun Rui
>
> gapply() applies an R function on groups grouped by one or more columns of a 
> DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() 
> in the Dataset API.
> Two API styles are supported:
> 1.
> {code}
> gd <- groupBy(df, col1, ...)
> gapply(gd, function(grouping_key, group) {}, schema)
> {code}
> 2.
> {code}
> gapply(df, grouping_columns, function(grouping_key, group) {}, schema) 
> {code}
> R function input: grouping keys value, a local data.frame of this grouped 
> data 
> R function output: local data.frame
> Schema specifies the Row format of the output of the R function. It must 
> match the R function's output.
> Note that map-side combination (partial aggregation) is not supported, user 
> could do map-side combination via dapply().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14554) disable whole stage codegen if there are too many input columns

2016-04-11 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-14554:

Summary: disable whole stage codegen if there are too many input columns  
(was: Dataset.map may generate wrong java code for wide table)

> disable whole stage codegen if there are too many input columns
> ---
>
> Key: SPARK-14554
> URL: https://issues.apache.org/jira/browse/SPARK-14554
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14546) Scale Wrapper in SparkR

2016-04-11 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236556#comment-15236556
 ] 

Yong Tang commented on SPARK-14546:
---

[~aloknsingh] I can work on this one if no one has started yet. Thanks.

> Scale Wrapper in SparkR
> ---
>
> Key: SPARK-14546
> URL: https://issues.apache.org/jira/browse/SPARK-14546
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Alok Singh
>
> ML has the StandardScaler and that seems like very commonly used.
> This jira is to implement the SparkR wrapper for it .
> Here is the R scale command
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14551) Reduce number of NameNode calls in OrcRelation with FileSourceStrategy mode

2016-04-11 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated SPARK-14551:
-
Summary: Reduce number of NameNode calls in OrcRelation with 
FileSourceStrategy mode  (was: Reduce number of NN calls in OrcRelation with 
FileSourceStrategy mode)

> Reduce number of NameNode calls in OrcRelation with FileSourceStrategy mode
> ---
>
> Key: SPARK-14551
> URL: https://issues.apache.org/jira/browse/SPARK-14551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> When FileSourceStrategy is used, record reader is created which incurs a NN 
> call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading 
> the file information to get the ObjectInspector. This incurs additional NN 
> call. It would be good to avoid this additional NN call (specifically for 
> partitioned datasets)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14132) [Table related commands] Alter partition

2016-04-11 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-14132.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12220
[https://github.com/apache/spark/pull/12220]

> [Table related commands] Alter partition
> 
>
> Key: SPARK-14132
> URL: https://issues.apache.org/jira/browse/SPARK-14132
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> For alter column command, we have the following tokens.
> TOK_ALTERTABLE_ADDPARTS
> TOK_ALTERTABLE_DROPPARTS
> TOK_MSCK
> TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE
> For data source tables, we should throw exceptions.
> For Hive tables, we should support add and drop partitions. For now, it 
> should be fine to throw an exception for the rest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML

2016-04-11 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236541#comment-15236541
 ] 

Yong Tang commented on SPARK-14409:
---

[~mlnick] [~josephkb] I added a short doc in google driver with comment enabled:
https://docs.google.com/document/d/1YEvf5eEm2vRcALJs39yICWmUx6xFW5j8DvXFWbRbStE/edit?usp=sharing
Please let me know if there is any feedback. Thanks

> Investigate adding a RankingEvaluator to ML
> ---
>
> Key: SPARK-14409
> URL: https://issues.apache.org/jira/browse/SPARK-14409
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Nick Pentreath
>Priority: Minor
>
> {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no 
> {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful 
> for recommendation evaluation (and can be useful in other settings 
> potentially).
> Should be thought about in conjunction with adding the "recommendAll" methods 
> in SPARK-13857, so that top-k ranking metrics can be used in cross-validators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14554) Dataset.map may generate wrong java code for wide table

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14554:


Assignee: Wenchen Fan  (was: Apache Spark)

> Dataset.map may generate wrong java code for wide table
> ---
>
> Key: SPARK-14554
> URL: https://issues.apache.org/jira/browse/SPARK-14554
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14556) Code clean-ups for package o.a.s.sql.execution.streaming.state

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14556:


Assignee: Apache Spark

> Code clean-ups for package o.a.s.sql.execution.streaming.state
> --
>
> Key: SPARK-14556
> URL: https://issues.apache.org/jira/browse/SPARK-14556
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Liwei Lin
>Assignee: Apache Spark
>Priority: Minor
>
> - `StateStoreConf.**max**DeltasForSnapshot` was renamed to 
> `StateStoreConf.**min**DeltasForSnapshot`
> - some state switch checks were added
> - improved consistency between method names and string literals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14554) Dataset.map may generate wrong java code for wide table

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14554:


Assignee: Apache Spark  (was: Wenchen Fan)

> Dataset.map may generate wrong java code for wide table
> ---
>
> Key: SPARK-14554
> URL: https://issues.apache.org/jira/browse/SPARK-14554
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14556) Code clean-ups for package o.a.s.sql.execution.streaming.state

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236543#comment-15236543
 ] 

Apache Spark commented on SPARK-14556:
--

User 'lw-lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/12323

> Code clean-ups for package o.a.s.sql.execution.streaming.state
> --
>
> Key: SPARK-14556
> URL: https://issues.apache.org/jira/browse/SPARK-14556
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Liwei Lin
>Priority: Minor
>
> - `StateStoreConf.**max**DeltasForSnapshot` was renamed to 
> `StateStoreConf.**min**DeltasForSnapshot`
> - some state switch checks were added
> - improved consistency between method names and string literals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14556) Code clean-ups for package o.a.s.sql.execution.streaming.state

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14556:


Assignee: (was: Apache Spark)

> Code clean-ups for package o.a.s.sql.execution.streaming.state
> --
>
> Key: SPARK-14556
> URL: https://issues.apache.org/jira/browse/SPARK-14556
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Liwei Lin
>Priority: Minor
>
> - `StateStoreConf.**max**DeltasForSnapshot` was renamed to 
> `StateStoreConf.**min**DeltasForSnapshot`
> - some state switch checks were added
> - improved consistency between method names and string literals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14554) Dataset.map may generate wrong java code for wide table

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236544#comment-15236544
 ] 

Apache Spark commented on SPARK-14554:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/12322

> Dataset.map may generate wrong java code for wide table
> ---
>
> Key: SPARK-14554
> URL: https://issues.apache.org/jira/browse/SPARK-14554
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14556) Code clean-ups for package o.a.s.sql.execution.streaming.state

2016-04-11 Thread Liwei Lin (JIRA)
Liwei Lin created SPARK-14556:
-

 Summary: Code clean-ups for package 
o.a.s.sql.execution.streaming.state
 Key: SPARK-14556
 URL: https://issues.apache.org/jira/browse/SPARK-14556
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Liwei Lin
Priority: Minor


- `StateStoreConf.**max**DeltasForSnapshot` was renamed to 
`StateStoreConf.**min**DeltasForSnapshot`
- some state switch checks were added
- improved consistency between method names and string literals



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14362) DDL Native Support: Drop View

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236529#comment-15236529
 ] 

Apache Spark commented on SPARK-14362:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/12321

> DDL Native Support: Drop View
> -
>
> Key: SPARK-14362
> URL: https://issues.apache.org/jira/browse/SPARK-14362
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> Native parsing and native analysis of DDL command: Drop View.
> Based on the HIVE DDL document for 
> [DROP_VIEW_WEB_LINK](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-
> DropView
> ), `DROP VIEW` is defined as, 
> Syntax:
> {noformat}
> DROP VIEW [IF EXISTS] [db_name.]view_name;
> {noformat}
>  - to remove metadata for the specified view. 
>  - illegal to use DROP TABLE on a view.
>  - illegal to use DROP VIEW on a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14406) Drop Table

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236530#comment-15236530
 ] 

Apache Spark commented on SPARK-14406:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/12321

> Drop Table
> --
>
> Key: SPARK-14406
> URL: https://issues.apache.org/jira/browse/SPARK-14406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> Right now, DropTable command is in hive module. We should remove the call of 
> runSqlHive and move it to sql/core.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14555) Python API for methods introduced for Structured Streaming

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14555:


Assignee: (was: Apache Spark)

> Python API for methods introduced for Structured Streaming
> --
>
> Key: SPARK-14555
> URL: https://issues.apache.org/jira/browse/SPARK-14555
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Streaming
>Reporter: Burak Yavuz
>
> Methods added for Structured Streaming don't have a Python API yet.
> We need to provide APIs for the new methods in:
>  - DataFrameReader
>  - DataFrameWriter
>  - ContinuousQuery
>  - Trigger



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14555) Python API for methods introduced for Structured Streaming

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14555:


Assignee: Apache Spark

> Python API for methods introduced for Structured Streaming
> --
>
> Key: SPARK-14555
> URL: https://issues.apache.org/jira/browse/SPARK-14555
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Streaming
>Reporter: Burak Yavuz
>Assignee: Apache Spark
>
> Methods added for Structured Streaming don't have a Python API yet.
> We need to provide APIs for the new methods in:
>  - DataFrameReader
>  - DataFrameWriter
>  - ContinuousQuery
>  - Trigger



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14555) Python API for methods introduced for Structured Streaming

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236528#comment-15236528
 ] 

Apache Spark commented on SPARK-14555:
--

User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/12320

> Python API for methods introduced for Structured Streaming
> --
>
> Key: SPARK-14555
> URL: https://issues.apache.org/jira/browse/SPARK-14555
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Streaming
>Reporter: Burak Yavuz
>
> Methods added for Structured Streaming don't have a Python API yet.
> We need to provide APIs for the new methods in:
>  - DataFrameReader
>  - DataFrameWriter
>  - ContinuousQuery
>  - Trigger



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14555) Python API for methods introduced for Structured Streaming

2016-04-11 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-14555:
---

 Summary: Python API for methods introduced for Structured Streaming
 Key: SPARK-14555
 URL: https://issues.apache.org/jira/browse/SPARK-14555
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL, Streaming
Reporter: Burak Yavuz


Methods added for Structured Streaming don't have a Python API yet.
We need to provide APIs for the new methods in:

 - DataFrameReader
 - DataFrameWriter
 - ContinuousQuery
 - Trigger



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14554) Dataset.map may generate wrong java code for wide table

2016-04-11 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-14554:
---

 Summary: Dataset.map may generate wrong java code for wide table
 Key: SPARK-14554
 URL: https://issues.apache.org/jira/browse/SPARK-14554
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11828) DAGScheduler source registered too early with MetricsSystem

2016-04-11 Thread Dubkov Mikhail (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236510#comment-15236510
 ] 

Dubkov Mikhail commented on SPARK-11828:


[~vanzin]

Could you please look into 
http://stackoverflow.com/questions/36133952/why-cant-i-run-spark-shell-with-yarn-in-client-mode/36561486#36561486

We have NPE on line which you added, maybe you have ideas why it happens ?

Thanks!

> DAGScheduler source registered too early with MetricsSystem
> ---
>
> Key: SPARK-11828
> URL: https://issues.apache.org/jira/browse/SPARK-11828
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 1.6.0
>
>
> I see this log message when starting apps on YARN:
> {quote}
> 15/11/18 13:12:56 WARN MetricsSystem: Using default name DAGScheduler for 
> source because spark.app.id is not set.
> {quote}
> That's because DAGScheduler registers itself with the metrics system in its 
> constructor, and the DAGScheduler is instantiated before "spark.app.id" is 
> set in the context's SparkConf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14550) OneHotEncoding wrapper in SparkR

2016-04-11 Thread Alok Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236497#comment-15236497
 ] 

Alok Singh commented on SPARK-14550:


Hi [~mengxr]

  exposing one hot encoding to the sparkR will help a lot to the R user. So 
created this jira. Let us know any feedback if any

thanks
Alok

> OneHotEncoding wrapper in SparkR
> 
>
> Key: SPARK-14550
> URL: https://issues.apache.org/jira/browse/SPARK-14550
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Alok Singh
>
> Implement OneHotEncoding in R.
> In R , usually one can use model.matrix to do one hot encoding. which accepts 
> formula. I think we can support simple formula here.
> model.matrix doc: 
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/model.matrix.html
> here is the example, that would be nice to have
> example :
> http://stackoverflow.com/questions/16200241/recode-categorical-factor-with-n-categories-into-n-binary-columns



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14553) PCA wrapper for SparkR

2016-04-11 Thread Alok Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236495#comment-15236495
 ] 

Alok Singh commented on SPARK-14553:


Hi [~mengxr] ,

Since all the ml apis are now expanded in 1.6. It would be nice to create more 
algorithm exposure to the SparkR

what do you think?

thanks
Alok

> PCA wrapper for SparkR
> --
>
> Key: SPARK-14553
> URL: https://issues.apache.org/jira/browse/SPARK-14553
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Alok Singh
>
> Implement the SparkR wrapper for the PCA transformer
> https://spark.apache.org/docs/latest/ml-features.html#pca
> we should support api similar to R i.e 
> featire<-prcomp(df,
>  center = TRUE,
>  scale. = TRUE) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14553) PCA wrapper for SparkR

2016-04-11 Thread Alok Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alok Singh updated SPARK-14553:
---
Description: 
Implement the SparkR wrapper for the PCA transformer

https://spark.apache.org/docs/latest/ml-features.html#pca

we should support api similar to R i.e 

featire<-prcomp(df,
 center = TRUE,
 scale. = TRUE) 



  was:
Implement the SparkR wrapper for the PCA transformer

https://spark.apache.org/docs/latest/ml-features.html#pca

we should support api similar to R i.e 

prcomp(log.ir,
 center = TRUE,
 scale. = TRUE) 




> PCA wrapper for SparkR
> --
>
> Key: SPARK-14553
> URL: https://issues.apache.org/jira/browse/SPARK-14553
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Alok Singh
>
> Implement the SparkR wrapper for the PCA transformer
> https://spark.apache.org/docs/latest/ml-features.html#pca
> we should support api similar to R i.e 
> featire<-prcomp(df,
>  center = TRUE,
>  scale. = TRUE) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14553) PCA wrapper for SparkR

2016-04-11 Thread Alok Singh (JIRA)
Alok Singh created SPARK-14553:
--

 Summary: PCA wrapper for SparkR
 Key: SPARK-14553
 URL: https://issues.apache.org/jira/browse/SPARK-14553
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Alok Singh


Implement the SparkR wrapper for the PCA transformer

https://spark.apache.org/docs/latest/ml-features.html#pca

we should support api similar to R i.e 

prcomp(log.ir,
 center = TRUE,
 scale. = TRUE) 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14552) ReValue wrapper for SparkR

2016-04-11 Thread Alok Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alok Singh updated SPARK-14552:
---
Description: 
Implement the wrapper for VectorIndexer.

In R in the dply package one can do the following 


x <- c("a", "b", "c")
revalue(x, c(a = "1", c = "2"))



  was:
Implement the wrapper for VectorIndexer.

In R in the dply package one can do the following 

x <- c("a", "b", "c")
revalue(x, c(a = "1", c = "2"))




> ReValue wrapper for SparkR
> --
>
> Key: SPARK-14552
> URL: https://issues.apache.org/jira/browse/SPARK-14552
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Alok Singh
>
> Implement the wrapper for VectorIndexer.
> In R in the dply package one can do the following 
> x <- c("a", "b", "c")
> revalue(x, c(a = "1", c = "2"))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14548) Support !> and !< operator in Spark SQL

2016-04-11 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236487#comment-15236487
 ] 

Sean Owen commented on SPARK-14548:
---

It says these are non-standard though. I can understand supporting them to let 
some legacy SQL query run but is it realistic to expect this is the only such 
issue? it doesn't seem worth supporting for its own sake as it's confusing.

> Support !> and !< operator in Spark SQL
> ---
>
> Key: SPARK-14548
> URL: https://issues.apache.org/jira/browse/SPARK-14548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jia Li
>Priority: Minor
>
> !< means not less than which is equivalent to >=
> !> means not greater than which is equivalent to <= 
> I'd to create a PR to support these two operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14552) ReValue wrapper for SparkR

2016-04-11 Thread Alok Singh (JIRA)
Alok Singh created SPARK-14552:
--

 Summary: ReValue wrapper for SparkR
 Key: SPARK-14552
 URL: https://issues.apache.org/jira/browse/SPARK-14552
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Alok Singh


Implement the wrapper for VectorIndexer.

In R in the dply package one can do the following 

x <- c("a", "b", "c")
revalue(x, c(a = "1", c = "2"))





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14548) Support !> and !< operator in Spark SQL

2016-04-11 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236461#comment-15236461
 ] 

Xiao Li edited comment on SPARK-14548 at 4/12/16 2:42 AM:
--

MS SQL Server supports these operators. 
https://msdn.microsoft.com/en-us/library/ms188074.aspx


was (Author: smilegator):
https://msdn.microsoft.com/en-us/library/ms188074.aspx

> Support !> and !< operator in Spark SQL
> ---
>
> Key: SPARK-14548
> URL: https://issues.apache.org/jira/browse/SPARK-14548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jia Li
>Priority: Minor
>
> !< means not less than which is equivalent to >=
> !> means not greater than which is equivalent to <= 
> I'd to create a PR to support these two operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12922) Implement gapply() on DataFrame in SparkR

2016-04-11 Thread Narine Kokhlikyan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236484#comment-15236484
 ] 

Narine Kokhlikyan commented on SPARK-12922:
---

Thanks for the quick response, [~sunrui].

I was playing with KeyValueGroupedDataset and have noticed that it works only 
for Datasets. When I try groupByKey for a DataFrame, it fails.
This succeeds: 
val grouped = ds.groupByKey(v => (v._1, "word"))

But the following fails:
val grouped = df.groupByKey(v => (v._1, "word"))

As far as I know in SparkR we are working with DataFrames, so this means that I 
need to convert the DataFrame to Dataset and work on Datasets on scala side ?!

Thanks,
Narine




> Implement gapply() on DataFrame in SparkR
> -
>
> Key: SPARK-12922
> URL: https://issues.apache.org/jira/browse/SPARK-12922
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.0
>Reporter: Sun Rui
>
> gapply() applies an R function on groups grouped by one or more columns of a 
> DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() 
> in the Dataset API.
> Two API styles are supported:
> 1.
> {code}
> gd <- groupBy(df, col1, ...)
> gapply(gd, function(grouping_key, group) {}, schema)
> {code}
> 2.
> {code}
> gapply(df, grouping_columns, function(grouping_key, group) {}, schema) 
> {code}
> R function input: grouping keys value, a local data.frame of this grouped 
> data 
> R function output: local data.frame
> Schema specifies the Row format of the output of the R function. It must 
> match the R function's output.
> Note that map-side combination (partial aggregation) is not supported, user 
> could do map-side combination via dapply().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14551) Reduce number of NN calls in OrcRelation with FileSourceStrategy mode

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14551:


Assignee: (was: Apache Spark)

> Reduce number of NN calls in OrcRelation with FileSourceStrategy mode
> -
>
> Key: SPARK-14551
> URL: https://issues.apache.org/jira/browse/SPARK-14551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> When FileSourceStrategy is used, record reader is created which incurs a NN 
> call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading 
> the file information to get the ObjectInspector. This incurs additional NN 
> call. It would be good to avoid this additional NN call (specifically for 
> partitioned datasets)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14551) Reduce number of NN calls in OrcRelation with FileSourceStrategy mode

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14551:


Assignee: Apache Spark

> Reduce number of NN calls in OrcRelation with FileSourceStrategy mode
> -
>
> Key: SPARK-14551
> URL: https://issues.apache.org/jira/browse/SPARK-14551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Rajesh Balamohan
>Assignee: Apache Spark
>Priority: Minor
>
> When FileSourceStrategy is used, record reader is created which incurs a NN 
> call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading 
> the file information to get the ObjectInspector. This incurs additional NN 
> call. It would be good to avoid this additional NN call (specifically for 
> partitioned datasets)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14551) Reduce number of NN calls in OrcRelation with FileSourceStrategy mode

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236478#comment-15236478
 ] 

Apache Spark commented on SPARK-14551:
--

User 'rajeshbalamohan' has created a pull request for this issue:
https://github.com/apache/spark/pull/12319

> Reduce number of NN calls in OrcRelation with FileSourceStrategy mode
> -
>
> Key: SPARK-14551
> URL: https://issues.apache.org/jira/browse/SPARK-14551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> When FileSourceStrategy is used, record reader is created which incurs a NN 
> call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading 
> the file information to get the ObjectInspector. This incurs additional NN 
> call. It would be good to avoid this additional NN call (specifically for 
> partitioned datasets)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14548) Support !> and !< operator in Spark SQL

2016-04-11 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236461#comment-15236461
 ] 

Xiao Li commented on SPARK-14548:
-

https://msdn.microsoft.com/en-us/library/ms188074.aspx

> Support !> and !< operator in Spark SQL
> ---
>
> Key: SPARK-14548
> URL: https://issues.apache.org/jira/browse/SPARK-14548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jia Li
>Priority: Minor
>
> !< means not less than which is equivalent to >=
> !> means not greater than which is equivalent to <= 
> I'd to create a PR to support these two operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14513) Threads left behind after stopping SparkContext

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14513:


Assignee: Apache Spark

> Threads left behind after stopping SparkContext
> ---
>
> Key: SPARK-14513
> URL: https://issues.apache.org/jira/browse/SPARK-14513
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Terence Yim
>Assignee: Apache Spark
>
> After {{SparkContext}} is stopped, there are couple threads that are left 
> behind. After some digging it is caused by couple bugs:
> 1. The {{HttpBasedFileServer.shutdown()}} is not getting called during 
> {{NettyRpcEnv.shutdown()}}, hence a thread is left and block on the 
> {{ServerSocket.accept()}} from the underlying Jetty {{Server}}.
> 2. {{QueuedThreadPool}} created in the {{HttpServer}} and through the 
> {{JettyUtils.startJettyServer}} method are never getting stopped. This is due 
> to the fact that thread pool used by Jetty {{Server}} won't get closed 
> automatically when the {{Server}} is stopped.
> I'll send out a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14513) Threads left behind after stopping SparkContext

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14513:


Assignee: (was: Apache Spark)

> Threads left behind after stopping SparkContext
> ---
>
> Key: SPARK-14513
> URL: https://issues.apache.org/jira/browse/SPARK-14513
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Terence Yim
>
> After {{SparkContext}} is stopped, there are couple threads that are left 
> behind. After some digging it is caused by couple bugs:
> 1. The {{HttpBasedFileServer.shutdown()}} is not getting called during 
> {{NettyRpcEnv.shutdown()}}, hence a thread is left and block on the 
> {{ServerSocket.accept()}} from the underlying Jetty {{Server}}.
> 2. {{QueuedThreadPool}} created in the {{HttpServer}} and through the 
> {{JettyUtils.startJettyServer}} method are never getting stopped. This is due 
> to the fact that thread pool used by Jetty {{Server}} won't get closed 
> automatically when the {{Server}} is stopped.
> I'll send out a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14513) Threads left behind after stopping SparkContext

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236455#comment-15236455
 ] 

Apache Spark commented on SPARK-14513:
--

User 'chtyim' has created a pull request for this issue:
https://github.com/apache/spark/pull/12318

> Threads left behind after stopping SparkContext
> ---
>
> Key: SPARK-14513
> URL: https://issues.apache.org/jira/browse/SPARK-14513
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Terence Yim
>
> After {{SparkContext}} is stopped, there are couple threads that are left 
> behind. After some digging it is caused by couple bugs:
> 1. The {{HttpBasedFileServer.shutdown()}} is not getting called during 
> {{NettyRpcEnv.shutdown()}}, hence a thread is left and block on the 
> {{ServerSocket.accept()}} from the underlying Jetty {{Server}}.
> 2. {{QueuedThreadPool}} created in the {{HttpServer}} and through the 
> {{JettyUtils.startJettyServer}} method are never getting stopped. This is due 
> to the fact that thread pool used by Jetty {{Server}} won't get closed 
> automatically when the {{Server}} is stopped.
> I'll send out a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14551) Reduce number of NN calls in OrcRelation with FileSourceStrategy mode

2016-04-11 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created SPARK-14551:


 Summary: Reduce number of NN calls in OrcRelation with 
FileSourceStrategy mode
 Key: SPARK-14551
 URL: https://issues.apache.org/jira/browse/SPARK-14551
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Rajesh Balamohan
Priority: Minor


When FileSourceStrategy is used, record reader is created which incurs a NN 
call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading the 
file information to get the ObjectInspector. This incurs additional NN call. It 
would be good to avoid this additional NN call (specifically for partitioned 
datasets)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14548) Support !> and !< operator in Spark SQL

2016-04-11 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236449#comment-15236449
 ] 

Sean Owen commented on SPARK-14548:
---

I've honestly never heard of these operators in any language. Does something 
support this syntax?
Why would I write !> instead of the much more familiar <= ?

> Support !> and !< operator in Spark SQL
> ---
>
> Key: SPARK-14548
> URL: https://issues.apache.org/jira/browse/SPARK-14548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jia Li
>Priority: Minor
>
> !< means not less than which is equivalent to >=
> !> means not greater than which is equivalent to <= 
> I'd to create a PR to support these two operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14520) ClasscastException thrown with spark.sql.parquet.enableVectorizedReader=true

2016-04-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14520.
-
   Resolution: Fixed
 Assignee: Liang-Chi Hsieh
Fix Version/s: 2.0.0

> ClasscastException thrown with spark.sql.parquet.enableVectorizedReader=true
> 
>
> Key: SPARK-14520
> URL: https://issues.apache.org/jira/browse/SPARK-14520
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Rajesh Balamohan
>Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> Build details: Spark build from master branch (Apr-10)
> TPC-DS at 200 GB scale stored in Parq format stored in hive.
> Ran TPC-DS Query27 via Spark beeline client with 
> "spark.sql.sources.fileScan=false".
> {noformat}
>  java.lang.ClassCastException: 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader
>  cannot be cast to org.apache.parquet.hadoop.ParquetRecordReader
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetInputFormat.createRecordReader(ParquetRelation.scala:480)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetInputFormat.createRecordReader(ParquetRelation.scala:476)
> at 
> org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.(SqlNewHadoopRDD.scala:161)
> at 
> org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:121)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69)
> at org.apache.spark.scheduler.Task.run(Task.scala:82)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:231)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Creating this JIRA as a placeholder to track this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14549) Copy the Vector and Matrix classes from mllib to ml in mllib-local

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236429#comment-15236429
 ] 

Apache Spark commented on SPARK-14549:
--

User 'dbtsai' has created a pull request for this issue:
https://github.com/apache/spark/pull/12317

> Copy the Vector and Matrix classes from mllib to ml in mllib-local
> --
>
> Key: SPARK-14549
> URL: https://issues.apache.org/jira/browse/SPARK-14549
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: DB Tsai
>Assignee: DB Tsai
>
> This task will copy the Vector and Matrix classes from mllib to ml package in 
> mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will 
> be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will 
> be replaced by /* @ since 1.2.0 */
> The BLAS implementation will be copied, and some of the test utilities will 
> be copies as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14549) Copy the Vector and Matrix classes from mllib to ml in mllib-local

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14549:


Assignee: DB Tsai  (was: Apache Spark)

> Copy the Vector and Matrix classes from mllib to ml in mllib-local
> --
>
> Key: SPARK-14549
> URL: https://issues.apache.org/jira/browse/SPARK-14549
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: DB Tsai
>Assignee: DB Tsai
>
> This task will copy the Vector and Matrix classes from mllib to ml package in 
> mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will 
> be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will 
> be replaced by /* @ since 1.2.0 */
> The BLAS implementation will be copied, and some of the test utilities will 
> be copies as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14549) Copy the Vector and Matrix classes from mllib to ml in mllib-local

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14549:


Assignee: Apache Spark  (was: DB Tsai)

> Copy the Vector and Matrix classes from mllib to ml in mllib-local
> --
>
> Key: SPARK-14549
> URL: https://issues.apache.org/jira/browse/SPARK-14549
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: DB Tsai
>Assignee: Apache Spark
>
> This task will copy the Vector and Matrix classes from mllib to ml package in 
> mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will 
> be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will 
> be replaced by /* @ since 1.2.0 */
> The BLAS implementation will be copied, and some of the test utilities will 
> be copies as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14550) OneHotEncoding wrapper in SparkR

2016-04-11 Thread Alok Singh (JIRA)
Alok Singh created SPARK-14550:
--

 Summary: OneHotEncoding wrapper in SparkR
 Key: SPARK-14550
 URL: https://issues.apache.org/jira/browse/SPARK-14550
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Alok Singh


Implement OneHotEncoding in R.

In R , usually one can use model.matrix to do one hot encoding. which accepts 
formula. I think we can support simple formula here.

model.matrix doc: 
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/model.matrix.html

here is the example, that would be nice to have
example :
http://stackoverflow.com/questions/16200241/recode-categorical-factor-with-n-categories-into-n-binary-columns



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13352) BlockFetch does not scale well on large block

2016-04-11 Thread Zhang, Liye (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236426#comment-15236426
 ] 

Zhang, Liye commented on SPARK-13352:
-

[~davies], the last result for 500M should be 7.8 seconds, not 7.8 min, right?

> BlockFetch does not scale well on large block
> -
>
> Key: SPARK-13352
> URL: https://issues.apache.org/jira/browse/SPARK-13352
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Reporter: Davies Liu
>Assignee: Zhang, Liye
>Priority: Critical
> Fix For: 1.6.2, 2.0.0
>
>
> BlockManager.getRemoteBytes() perform poorly on large block
> {code}
>   test("block manager") {
> val N = 500 << 20
> val bm = sc.env.blockManager
> val blockId = TaskResultBlockId(0)
> val buffer = ByteBuffer.allocate(N)
> buffer.limit(N)
> bm.putBytes(blockId, buffer, StorageLevel.MEMORY_AND_DISK_SER)
> val result = bm.getRemoteBytes(blockId)
> assert(result.isDefined)
> assert(result.get.limit() === (N))
>   }
> {code}
> Here are runtime for different block sizes:
> {code}
> 50M3 seconds
> 100M  7 seconds
> 250M  33 seconds
> 500M 2 min
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14549) Copy the Vector and Matrix classes from mllib to ml in mllib-local

2016-04-11 Thread DB Tsai (JIRA)
DB Tsai created SPARK-14549:
---

 Summary: Copy the Vector and Matrix classes from mllib to ml in 
mllib-local
 Key: SPARK-14549
 URL: https://issues.apache.org/jira/browse/SPARK-14549
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: DB Tsai
Assignee: DB Tsai


This task will copy the Vector and Matrix classes from mllib to ml package in 
mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will 
be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will be 
replaced by /* @ since 1.2.0 */

The BLAS implementation will be copied, and some of the test utilities will be 
copies as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14548) Support !> and !< operator in Spark SQL

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236416#comment-15236416
 ] 

Apache Spark commented on SPARK-14548:
--

User 'jliwork' has created a pull request for this issue:
https://github.com/apache/spark/pull/12316

> Support !> and !< operator in Spark SQL
> ---
>
> Key: SPARK-14548
> URL: https://issues.apache.org/jira/browse/SPARK-14548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jia Li
>Priority: Minor
>
> !< means not less than which is equivalent to >=
> !> means not greater than which is equivalent to <= 
> I'd to create a PR to support these two operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14548) Support !> and !< operator in Spark SQL

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14548:


Assignee: (was: Apache Spark)

> Support !> and !< operator in Spark SQL
> ---
>
> Key: SPARK-14548
> URL: https://issues.apache.org/jira/browse/SPARK-14548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jia Li
>Priority: Minor
>
> !< means not less than which is equivalent to >=
> !> means not greater than which is equivalent to <= 
> I'd to create a PR to support these two operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14548) Support !> and !< operator in Spark SQL

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14548:


Assignee: Apache Spark

> Support !> and !< operator in Spark SQL
> ---
>
> Key: SPARK-14548
> URL: https://issues.apache.org/jira/browse/SPARK-14548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jia Li
>Assignee: Apache Spark
>Priority: Minor
>
> !< means not less than which is equivalent to >=
> !> means not greater than which is equivalent to <= 
> I'd to create a PR to support these two operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14475) Propagate user-defined context from driver to executors

2016-04-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14475.
-
   Resolution: Fixed
 Assignee: Eric Liang
Fix Version/s: 2.0.0

> Propagate user-defined context from driver to executors
> ---
>
> Key: SPARK-14475
> URL: https://issues.apache.org/jira/browse/SPARK-14475
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Eric Liang
>Assignee: Eric Liang
> Fix For: 2.0.0
>
>
> It would be useful (e.g. for tracing) to automatically propagate arbitrary 
> user defined context (i.e. thread-locals) from the driver to executors. We 
> can do this easily by adding sc.localProperties to TaskContext.
> cc [~joshrosen]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14547) Avoid DNS resolution for reusing connections

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14547:


Assignee: Reynold Xin  (was: Apache Spark)

> Avoid DNS resolution for reusing connections
> 
>
> Key: SPARK-14547
> URL: https://issues.apache.org/jira/browse/SPARK-14547
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14547) Avoid DNS resolution for reusing connections

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14547:


Assignee: Apache Spark  (was: Reynold Xin)

> Avoid DNS resolution for reusing connections
> 
>
> Key: SPARK-14547
> URL: https://issues.apache.org/jira/browse/SPARK-14547
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14547) Avoid DNS resolution for reusing connections

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236402#comment-15236402
 ] 

Apache Spark commented on SPARK-14547:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/12315

> Avoid DNS resolution for reusing connections
> 
>
> Key: SPARK-14547
> URL: https://issues.apache.org/jira/browse/SPARK-14547
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14548) Support !> and !< operator in Spark SQL

2016-04-11 Thread Jia Li (JIRA)
Jia Li created SPARK-14548:
--

 Summary: Support !> and !< operator in Spark SQL
 Key: SPARK-14548
 URL: https://issues.apache.org/jira/browse/SPARK-14548
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Jia Li
Priority: Minor


!< means not less than which is equivalent to >=
!> means not greater than which is equivalent to <= 

I'd to create a PR to support these two operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14547) Avoid DNS resolution for reusing connections

2016-04-11 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-14547:
---

 Summary: Avoid DNS resolution for reusing connections
 Key: SPARK-14547
 URL: https://issues.apache.org/jira/browse/SPARK-14547
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14546) Scale Wrapper in SparkR

2016-04-11 Thread Alok Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alok Singh updated SPARK-14546:
---
Summary: Scale Wrapper in SparkR  (was: Scale Wrapper )

> Scale Wrapper in SparkR
> ---
>
> Key: SPARK-14546
> URL: https://issues.apache.org/jira/browse/SPARK-14546
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Alok Singh
>
> ML has the StandardScaler and that seems like very commonly used.
> This jira is to implement the SparkR wrapper for it .
> Here is the R scale command
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14546) Scale Wrapper

2016-04-11 Thread Alok Singh (JIRA)
Alok Singh created SPARK-14546:
--

 Summary: Scale Wrapper 
 Key: SPARK-14546
 URL: https://issues.apache.org/jira/browse/SPARK-14546
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Alok Singh


ML has the StandardScaler and that seems like very commonly used.
This jira is to implement the SparkR wrapper for it .

Here is the R scale command

https://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14441) Consolidate DDL tests

2016-04-11 Thread Bo Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236359#comment-15236359
 ] 

Bo Meng commented on SPARK-14441:
-

I think DDLSuite and DDLCommandSuite can be combined into one, also can 
HiveDDLSuite and HiveDDLCommandSuite, since they are just testing the different 
stage.
If you agree, I will make the changes.


> Consolidate DDL tests
> -
>
> Key: SPARK-14441
> URL: https://issues.apache.org/jira/browse/SPARK-14441
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing 
> whether a test should exist in one or the other. It also makes it less clear 
> whether our test coverage is comprehensive. Ideally we should consolidate 
> these files as much as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14126) [Table related commands] Truncate table

2016-04-11 Thread Adrian Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236353#comment-15236353
 ] 

Adrian Wang commented on SPARK-14126:
-

I'm working on this.

> [Table related commands] Truncate table
> ---
>
> Key: SPARK-14126
> URL: https://issues.apache.org/jira/browse/SPARK-14126
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_TRUNCATETABLE
> We also need to check the behavior of Hive when we call truncate table on a 
> partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14414) Make error messages consistent across DDLs

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236336#comment-15236336
 ] 

Apache Spark commented on SPARK-14414:
--

User 'bomeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/12314

> Make error messages consistent across DDLs
> --
>
> Key: SPARK-14414
> URL: https://issues.apache.org/jira/browse/SPARK-14414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> There are many different error messages right now when the user tries to run 
> something that's not supported. We might throw AnalysisException or 
> ParseException or NoSuchFunctionException etc. We should make all of these 
> consistent before 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14414) Make error messages consistent across DDLs

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14414:


Assignee: Andrew Or  (was: Apache Spark)

> Make error messages consistent across DDLs
> --
>
> Key: SPARK-14414
> URL: https://issues.apache.org/jira/browse/SPARK-14414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> There are many different error messages right now when the user tries to run 
> something that's not supported. We might throw AnalysisException or 
> ParseException or NoSuchFunctionException etc. We should make all of these 
> consistent before 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14414) Make error messages consistent across DDLs

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14414:


Assignee: Apache Spark  (was: Andrew Or)

> Make error messages consistent across DDLs
> --
>
> Key: SPARK-14414
> URL: https://issues.apache.org/jira/browse/SPARK-14414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>
> There are many different error messages right now when the user tries to run 
> something that's not supported. We might throw AnalysisException or 
> ParseException or NoSuchFunctionException etc. We should make all of these 
> consistent before 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14543) SQL/Hive insertInto has unexpected results

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14543:


Assignee: (was: Apache Spark)

> SQL/Hive insertInto has unexpected results
> --
>
> Key: SPARK-14543
> URL: https://issues.apache.org/jira/browse/SPARK-14543
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ryan Blue
>
> The Hive write path adds a pre-insertion cast (projection) to reconcile 
> incoming data columns with the outgoing table schema. Columns are matched by 
> position and casts are inserted to reconcile the two column schemas.
> When columns aren't correctly aligned, this causes unexpected results. I ran 
> into this by not using a correct {{partitionBy}} call (addressed by 
> SPARK-14459), which caused an error message that an int could not be cast to 
> an array. However, if the columns are vaguely compatible, for example string 
> and float, then no error or warning is produced and data is written to the 
> wrong columns using unexpected casts (string -> bigint -> float).
> A real-world use case that will hit this is when a table definition changes 
> by adding a column in the middle of a table. Spark SQL statements that copied 
> from that table to a destination table will then map the columns differently 
> but insert casts that mask the problem. The last column's data will be 
> dropped without a reliable warning for the user.
> This highlights a few problems:
> * Too many or too few incoming data columns should cause an AnalysisException 
> to be thrown
> * Only "safe" casts should be inserted automatically, like int -> long, using 
> UpCast
> * Pre-insertion casts currently ignore extra columns by using zip
> * The pre-insertion cast logic differs between Hive's MetastoreRelation and 
> LogicalRelation
> Also, I think there should be an option to match input data to output columns 
> by name. The API allows operations on tables, which hide the column 
> resolution problem. It's easy to copy from one table to another without 
> listing the columns, and in the API it is common to work with columns by name 
> rather than by position. I think the API should add a way to match columns by 
> name, which is closer to what users expect. I propose adding something like 
> this:
> {code}
> CREATE TABLE src (id: bigint, count: int, total: bigint)
> CREATE TABLE dst (id: bigint, total: bigint, count: int)
> sqlContext.table("src").write.byName.insertInto("dst")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14543) SQL/Hive insertInto has unexpected results

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14543:


Assignee: Apache Spark

> SQL/Hive insertInto has unexpected results
> --
>
> Key: SPARK-14543
> URL: https://issues.apache.org/jira/browse/SPARK-14543
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ryan Blue
>Assignee: Apache Spark
>
> The Hive write path adds a pre-insertion cast (projection) to reconcile 
> incoming data columns with the outgoing table schema. Columns are matched by 
> position and casts are inserted to reconcile the two column schemas.
> When columns aren't correctly aligned, this causes unexpected results. I ran 
> into this by not using a correct {{partitionBy}} call (addressed by 
> SPARK-14459), which caused an error message that an int could not be cast to 
> an array. However, if the columns are vaguely compatible, for example string 
> and float, then no error or warning is produced and data is written to the 
> wrong columns using unexpected casts (string -> bigint -> float).
> A real-world use case that will hit this is when a table definition changes 
> by adding a column in the middle of a table. Spark SQL statements that copied 
> from that table to a destination table will then map the columns differently 
> but insert casts that mask the problem. The last column's data will be 
> dropped without a reliable warning for the user.
> This highlights a few problems:
> * Too many or too few incoming data columns should cause an AnalysisException 
> to be thrown
> * Only "safe" casts should be inserted automatically, like int -> long, using 
> UpCast
> * Pre-insertion casts currently ignore extra columns by using zip
> * The pre-insertion cast logic differs between Hive's MetastoreRelation and 
> LogicalRelation
> Also, I think there should be an option to match input data to output columns 
> by name. The API allows operations on tables, which hide the column 
> resolution problem. It's easy to copy from one table to another without 
> listing the columns, and in the API it is common to work with columns by name 
> rather than by position. I think the API should add a way to match columns by 
> name, which is closer to what users expect. I propose adding something like 
> this:
> {code}
> CREATE TABLE src (id: bigint, count: int, total: bigint)
> CREATE TABLE dst (id: bigint, total: bigint, count: int)
> sqlContext.table("src").write.byName.insertInto("dst")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14543) SQL/Hive insertInto has unexpected results

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236315#comment-15236315
 ] 

Apache Spark commented on SPARK-14543:
--

User 'rdblue' has created a pull request for this issue:
https://github.com/apache/spark/pull/12313

> SQL/Hive insertInto has unexpected results
> --
>
> Key: SPARK-14543
> URL: https://issues.apache.org/jira/browse/SPARK-14543
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ryan Blue
>
> The Hive write path adds a pre-insertion cast (projection) to reconcile 
> incoming data columns with the outgoing table schema. Columns are matched by 
> position and casts are inserted to reconcile the two column schemas.
> When columns aren't correctly aligned, this causes unexpected results. I ran 
> into this by not using a correct {{partitionBy}} call (addressed by 
> SPARK-14459), which caused an error message that an int could not be cast to 
> an array. However, if the columns are vaguely compatible, for example string 
> and float, then no error or warning is produced and data is written to the 
> wrong columns using unexpected casts (string -> bigint -> float).
> A real-world use case that will hit this is when a table definition changes 
> by adding a column in the middle of a table. Spark SQL statements that copied 
> from that table to a destination table will then map the columns differently 
> but insert casts that mask the problem. The last column's data will be 
> dropped without a reliable warning for the user.
> This highlights a few problems:
> * Too many or too few incoming data columns should cause an AnalysisException 
> to be thrown
> * Only "safe" casts should be inserted automatically, like int -> long, using 
> UpCast
> * Pre-insertion casts currently ignore extra columns by using zip
> * The pre-insertion cast logic differs between Hive's MetastoreRelation and 
> LogicalRelation
> Also, I think there should be an option to match input data to output columns 
> by name. The API allows operations on tables, which hide the column 
> resolution problem. It's easy to copy from one table to another without 
> listing the columns, and in the API it is common to work with columns by name 
> rather than by position. I think the API should add a way to match columns by 
> name, which is closer to what users expect. I propose adding something like 
> this:
> {code}
> CREATE TABLE src (id: bigint, count: int, total: bigint)
> CREATE TABLE dst (id: bigint, total: bigint, count: int)
> sqlContext.table("src").write.byName.insertInto("dst")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule

2016-04-11 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-14545:
--
Description: 
Current `LikeSimplification` handles the following four rules.
- 'a%' => expr.StartsWith("a")
- '%b' => expr.EndsWith("b")
- '%a%' => expr.Contains("a")
- 'a' => EqualTo("a")

This issue adds the following rule.
- 'a%b' => expr.Length() >= 2 && expr.StartsWith("a") && expr.EndsWith("b")
Here, 2 is statically calculated from "a".size + "b".size.

  was:
Current `LikeSimplification` handles the following four rules.
- 'a%' => expr.StartsWith("a")
- '%b' => expr.EndsWith("b")
- '%a%' => expr.Contains("a")
- 'a' => EqualTo("a")

This issue adds the following rule.
- 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b")
Here, 2 is statically calculated from "a".size + "b".size.


> Improve `LikeSimplification` by adding `a%b` rule
> -
>
> Key: SPARK-14545
> URL: https://issues.apache.org/jira/browse/SPARK-14545
> Project: Spark
>  Issue Type: New Feature
>  Components: Optimizer
>Reporter: Dongjoon Hyun
>
> Current `LikeSimplification` handles the following four rules.
> - 'a%' => expr.StartsWith("a")
> - '%b' => expr.EndsWith("b")
> - '%a%' => expr.Contains("a")
> - 'a' => EqualTo("a")
> This issue adds the following rule.
> - 'a%b' => expr.Length() >= 2 && expr.StartsWith("a") && expr.EndsWith("b")
> Here, 2 is statically calculated from "a".size + "b".size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14545:


Assignee: (was: Apache Spark)

> Improve `LikeSimplification` by adding `a%b` rule
> -
>
> Key: SPARK-14545
> URL: https://issues.apache.org/jira/browse/SPARK-14545
> Project: Spark
>  Issue Type: New Feature
>  Components: Optimizer
>Reporter: Dongjoon Hyun
>
> Current `LikeSimplification` handles the following four rules.
> - 'a%' => expr.StartsWith("a")
> - '%b' => expr.EndsWith("b")
> - '%a%' => expr.Contains("a")
> - 'a' => EqualTo("a")
> This issue adds the following rule.
> - 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b")
> Here, 2 is statically calculated from "a".size + "b".size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14545:


Assignee: Apache Spark

> Improve `LikeSimplification` by adding `a%b` rule
> -
>
> Key: SPARK-14545
> URL: https://issues.apache.org/jira/browse/SPARK-14545
> Project: Spark
>  Issue Type: New Feature
>  Components: Optimizer
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> Current `LikeSimplification` handles the following four rules.
> - 'a%' => expr.StartsWith("a")
> - '%b' => expr.EndsWith("b")
> - '%a%' => expr.Contains("a")
> - 'a' => EqualTo("a")
> This issue adds the following rule.
> - 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b")
> Here, 2 is statically calculated from "a".size + "b".size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236293#comment-15236293
 ] 

Apache Spark commented on SPARK-14545:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/12312

> Improve `LikeSimplification` by adding `a%b` rule
> -
>
> Key: SPARK-14545
> URL: https://issues.apache.org/jira/browse/SPARK-14545
> Project: Spark
>  Issue Type: New Feature
>  Components: Optimizer
>Reporter: Dongjoon Hyun
>
> Current `LikeSimplification` handles the following four rules.
> - 'a%' => expr.StartsWith("a")
> - '%b' => expr.EndsWith("b")
> - '%a%' => expr.Contains("a")
> - 'a' => EqualTo("a")
> This issue adds the following rule.
> - 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b")
> Here, 2 is statically calculated from "a".size + "b".size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule

2016-04-11 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-14545:
-

 Summary: Improve `LikeSimplification` by adding `a%b` rule
 Key: SPARK-14545
 URL: https://issues.apache.org/jira/browse/SPARK-14545
 Project: Spark
  Issue Type: New Feature
  Components: Optimizer
Reporter: Dongjoon Hyun


Current `LikeSimplification` handles the following four rules.
- 'a%' => expr.StartsWith("a")
- '%b' => expr.EndsWith("b")
- '%a%' => expr.Contains("a")
- 'a' => EqualTo("a")

This issue adds the following rule.
- 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b")
Here, 2 is statically calculated from "a".size + "b".size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14544) Spark UI is very slow in recent Chrome

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14544:


Assignee: Davies Liu  (was: Apache Spark)

> Spark UI is very slow in recent Chrome
> --
>
> Key: SPARK-14544
> URL: https://issues.apache.org/jira/browse/SPARK-14544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Once I ran an complicated query or there were many query in the SQL tab, the 
> page is really really slow in Chrome 49, but fast in Safari/Firefox.
> Given that the fact that many users are using Chrome, so we should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14544) Spark UI is very slow in recent Chrome

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236211#comment-15236211
 ] 

Apache Spark commented on SPARK-14544:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/12311

> Spark UI is very slow in recent Chrome
> --
>
> Key: SPARK-14544
> URL: https://issues.apache.org/jira/browse/SPARK-14544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Once I ran an complicated query or there were many query in the SQL tab, the 
> page is really really slow in Chrome 49, but fast in Safari/Firefox.
> Given that the fact that many users are using Chrome, so we should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14544) Spark UI is very slow in recent Chrome

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14544:


Assignee: Apache Spark  (was: Davies Liu)

> Spark UI is very slow in recent Chrome
> --
>
> Key: SPARK-14544
> URL: https://issues.apache.org/jira/browse/SPARK-14544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>
> Once I ran an complicated query or there were many query in the SQL tab, the 
> page is really really slow in Chrome 49, but fast in Safari/Firefox.
> Given that the fact that many users are using Chrome, so we should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support

2016-04-11 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10521:
---
Assignee: Luciano Resende

> Utilize Docker to test DB2 JDBC Dialect support
> ---
>
> Key: SPARK-10521
> URL: https://issues.apache.org/jira/browse/SPARK-10521
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Luciano Resende
>Assignee: Luciano Resende
> Fix For: 2.0.0
>
>
> There was a discussion in SPARK-10170 around using a docker image to execute 
> the DB2 JDBC dialect tests. I will use this jira to work on providing the 
> basic image together with the test integration. We can then extend the 
> testing coverage as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support

2016-04-11 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-10521.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 9893
[https://github.com/apache/spark/pull/9893]

> Utilize Docker to test DB2 JDBC Dialect support
> ---
>
> Key: SPARK-10521
> URL: https://issues.apache.org/jira/browse/SPARK-10521
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Luciano Resende
> Fix For: 2.0.0
>
>
> There was a discussion in SPARK-10170 around using a docker image to execute 
> the DB2 JDBC dialect tests. I will use this jira to work on providing the 
> basic image together with the test integration. We can then extend the 
> testing coverage as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14401) Switch to stock sbt-pom-reader plugin

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14401:


Assignee: Apache Spark

> Switch to stock sbt-pom-reader plugin
> -
>
> Key: SPARK-14401
> URL: https://issues.apache.org/jira/browse/SPARK-14401
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> Spark currently depends on a forked version of {{sbt-pom-reader}} which we 
> build from source. It would be great to port our modifications to the 
> upstream project so that we can migrate to the official version and stop 
> maintaining our fork.
> [~scrapco...@gmail.com], could you edit this ticket to fill in more detail 
> about which custom changes have not been ported yet?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-1844) Support maven-style dependency resolution in sbt build

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-1844:
---

Assignee: Apache Spark  (was: Josh Rosen)

> Support maven-style dependency resolution in sbt build
> --
>
> Key: SPARK-1844
> URL: https://issues.apache.org/jira/browse/SPARK-1844
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Apache Spark
>
> [Currently this is a brainstorm/wish - not sure it's possible]
> Ivy/sbt and maven use fundamentally different strategies when transitive 
> dependencies conflict (i.e. when we have two copies of library Y in our 
> dependency graph on different versions).
> This actually means our sbt and maven builds have been divergent for a long 
> time.
> Ivy/sbt have a pluggable notion of a [conflict 
> manager|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.ivy/ivy/2.3.0/org/apache/ivy/plugins/conflict/ConflictManager.java].
>  The default chooses the newest version of the dependency. SBT [allows this 
> to be 
> changed|http://www.scala-sbt.org/release/sxr/sbt/IvyInterface.scala.html#sbt;ConflictManager]
>  though.
> Maven employs the [nearest 
> wins|http://techidiocy.com/maven-dependency-version-conflict-problem-and-resolution/]
>  policy which means the version closes to the project root is chosen.
> It would be nice to be able to have matching semantics in the builds. We 
> could do this by writing a conflict manger in sbt that mimics Maven's 
> behavior. The fact that IVY-813 has existed for 6 years without anyone doing 
> this makes me wonder if that is not possible or very hard :P



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14401) Switch to stock sbt-pom-reader plugin

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236204#comment-15236204
 ] 

Apache Spark commented on SPARK-14401:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/12310

> Switch to stock sbt-pom-reader plugin
> -
>
> Key: SPARK-14401
> URL: https://issues.apache.org/jira/browse/SPARK-14401
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> Spark currently depends on a forked version of {{sbt-pom-reader}} which we 
> build from source. It would be great to port our modifications to the 
> upstream project so that we can migrate to the official version and stop 
> maintaining our fork.
> [~scrapco...@gmail.com], could you edit this ticket to fill in more detail 
> about which custom changes have not been ported yet?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-1844) Support maven-style dependency resolution in sbt build

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-1844:
---

Assignee: Josh Rosen  (was: Apache Spark)

> Support maven-style dependency resolution in sbt build
> --
>
> Key: SPARK-1844
> URL: https://issues.apache.org/jira/browse/SPARK-1844
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>
> [Currently this is a brainstorm/wish - not sure it's possible]
> Ivy/sbt and maven use fundamentally different strategies when transitive 
> dependencies conflict (i.e. when we have two copies of library Y in our 
> dependency graph on different versions).
> This actually means our sbt and maven builds have been divergent for a long 
> time.
> Ivy/sbt have a pluggable notion of a [conflict 
> manager|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.ivy/ivy/2.3.0/org/apache/ivy/plugins/conflict/ConflictManager.java].
>  The default chooses the newest version of the dependency. SBT [allows this 
> to be 
> changed|http://www.scala-sbt.org/release/sxr/sbt/IvyInterface.scala.html#sbt;ConflictManager]
>  though.
> Maven employs the [nearest 
> wins|http://techidiocy.com/maven-dependency-version-conflict-problem-and-resolution/]
>  policy which means the version closes to the project root is chosen.
> It would be nice to be able to have matching semantics in the builds. We 
> could do this by writing a conflict manger in sbt that mimics Maven's 
> behavior. The fact that IVY-813 has existed for 6 years without anyone doing 
> this makes me wonder if that is not possible or very hard :P



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14401) Switch to stock sbt-pom-reader plugin

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14401:


Assignee: (was: Apache Spark)

> Switch to stock sbt-pom-reader plugin
> -
>
> Key: SPARK-14401
> URL: https://issues.apache.org/jira/browse/SPARK-14401
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> Spark currently depends on a forked version of {{sbt-pom-reader}} which we 
> build from source. It would be great to port our modifications to the 
> upstream project so that we can migrate to the official version and stop 
> maintaining our fork.
> [~scrapco...@gmail.com], could you edit this ticket to fill in more detail 
> about which custom changes have not been ported yet?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1844) Support maven-style dependency resolution in sbt build

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236205#comment-15236205
 ] 

Apache Spark commented on SPARK-1844:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/12310

> Support maven-style dependency resolution in sbt build
> --
>
> Key: SPARK-1844
> URL: https://issues.apache.org/jira/browse/SPARK-1844
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>
> [Currently this is a brainstorm/wish - not sure it's possible]
> Ivy/sbt and maven use fundamentally different strategies when transitive 
> dependencies conflict (i.e. when we have two copies of library Y in our 
> dependency graph on different versions).
> This actually means our sbt and maven builds have been divergent for a long 
> time.
> Ivy/sbt have a pluggable notion of a [conflict 
> manager|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.ivy/ivy/2.3.0/org/apache/ivy/plugins/conflict/ConflictManager.java].
>  The default chooses the newest version of the dependency. SBT [allows this 
> to be 
> changed|http://www.scala-sbt.org/release/sxr/sbt/IvyInterface.scala.html#sbt;ConflictManager]
>  though.
> Maven employs the [nearest 
> wins|http://techidiocy.com/maven-dependency-version-conflict-problem-and-resolution/]
>  policy which means the version closes to the project root is chosen.
> It would be nice to be able to have matching semantics in the builds. We 
> could do this by writing a conflict manger in sbt that mimics Maven's 
> behavior. The fact that IVY-813 has existed for 6 years without anyone doing 
> this makes me wonder if that is not possible or very hard :P



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-1844) Support maven-style dependency resolution in sbt build

2016-04-11 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reopened SPARK-1844:
---
  Assignee: Josh Rosen  (was: Prashant Sharma)

> Support maven-style dependency resolution in sbt build
> --
>
> Key: SPARK-1844
> URL: https://issues.apache.org/jira/browse/SPARK-1844
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>
> [Currently this is a brainstorm/wish - not sure it's possible]
> Ivy/sbt and maven use fundamentally different strategies when transitive 
> dependencies conflict (i.e. when we have two copies of library Y in our 
> dependency graph on different versions).
> This actually means our sbt and maven builds have been divergent for a long 
> time.
> Ivy/sbt have a pluggable notion of a [conflict 
> manager|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.ivy/ivy/2.3.0/org/apache/ivy/plugins/conflict/ConflictManager.java].
>  The default chooses the newest version of the dependency. SBT [allows this 
> to be 
> changed|http://www.scala-sbt.org/release/sxr/sbt/IvyInterface.scala.html#sbt;ConflictManager]
>  though.
> Maven employs the [nearest 
> wins|http://techidiocy.com/maven-dependency-version-conflict-problem-and-resolution/]
>  policy which means the version closes to the project root is chosen.
> It would be nice to be able to have matching semantics in the builds. We 
> could do this by writing a conflict manger in sbt that mimics Maven's 
> behavior. The fact that IVY-813 has existed for 6 years without anyone doing 
> this makes me wonder if that is not possible or very hard :P



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14401) Switch to stock sbt-pom-reader plugin

2016-04-11 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-14401:
---
Summary: Switch to stock sbt-pom-reader plugin  (was: Merge our 
sbt-pom-reader changes upstream)

> Switch to stock sbt-pom-reader plugin
> -
>
> Key: SPARK-14401
> URL: https://issues.apache.org/jira/browse/SPARK-14401
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>
> Spark currently depends on a forked version of {{sbt-pom-reader}} which we 
> build from source. It would be great to port our modifications to the 
> upstream project so that we can migrate to the official version and stop 
> maintaining our fork.
> [~scrapco...@gmail.com], could you edit this ticket to fill in more detail 
> about which custom changes have not been ported yet?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14544) Spark UI is very slow in recent Chrome

2016-04-11 Thread Davies Liu (JIRA)
Davies Liu created SPARK-14544:
--

 Summary: Spark UI is very slow in recent Chrome
 Key: SPARK-14544
 URL: https://issues.apache.org/jira/browse/SPARK-14544
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu


Once I ran an complicated query or there were many query in the SQL tab, the 
page is really really slow in Chrome 49, but fast in Safari/Firefox.

Given that the fact that many users are using Chrome, so we should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14298) LDA should support disable checkpoint

2016-04-11 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-14298.
---
   Resolution: Fixed
Fix Version/s: 1.6.2
   1.5.3

> LDA should support disable checkpoint
> -
>
> Key: SPARK-14298
> URL: https://issues.apache.org/jira/browse/SPARK-14298
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>Priority: Minor
> Fix For: 1.5.3, 1.6.2, 2.0.0
>
>
> LDA should support disable checkpoint by setting checkpointInterval = -1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14542) PipeRDD should allow configurable buffer size for the stdin writer

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14542:


Assignee: (was: Apache Spark)

> PipeRDD should allow configurable buffer size for the stdin writer 
> ---
>
> Key: SPARK-14542
> URL: https://issues.apache.org/jira/browse/SPARK-14542
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.1
>Reporter: Sital Kedia
>Priority: Minor
>
> Currently PipedRDD internally uses PrintWriter to write data to the stdin of 
> the piped process, which by default uses a BufferedWriter of buffer size 8k. 
> In our experiment, we have seen that 8k buffer size is too small and the job 
> spends significant amount of CPU time in system calls to copy the data. We 
> should have a way to configure the buffer size for the writer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14542) PipeRDD should allow configurable buffer size for the stdin writer

2016-04-11 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14542:


Assignee: Apache Spark

> PipeRDD should allow configurable buffer size for the stdin writer 
> ---
>
> Key: SPARK-14542
> URL: https://issues.apache.org/jira/browse/SPARK-14542
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.1
>Reporter: Sital Kedia
>Assignee: Apache Spark
>Priority: Minor
>
> Currently PipedRDD internally uses PrintWriter to write data to the stdin of 
> the piped process, which by default uses a BufferedWriter of buffer size 8k. 
> In our experiment, we have seen that 8k buffer size is too small and the job 
> spends significant amount of CPU time in system calls to copy the data. We 
> should have a way to configure the buffer size for the writer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14542) PipeRDD should allow configurable buffer size for the stdin writer

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236156#comment-15236156
 ] 

Apache Spark commented on SPARK-14542:
--

User 'sitalkedia' has created a pull request for this issue:
https://github.com/apache/spark/pull/12309

> PipeRDD should allow configurable buffer size for the stdin writer 
> ---
>
> Key: SPARK-14542
> URL: https://issues.apache.org/jira/browse/SPARK-14542
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.1
>Reporter: Sital Kedia
>Priority: Minor
>
> Currently PipedRDD internally uses PrintWriter to write data to the stdin of 
> the piped process, which by default uses a BufferedWriter of buffer size 8k. 
> In our experiment, we have seen that 8k buffer size is too small and the job 
> spends significant amount of CPU time in system calls to copy the data. We 
> should have a way to configure the buffer size for the writer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14543) SQL/Hive insertInto has unexpected results

2016-04-11 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-14543:
-

 Summary: SQL/Hive insertInto has unexpected results
 Key: SPARK-14543
 URL: https://issues.apache.org/jira/browse/SPARK-14543
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Ryan Blue


The Hive write path adds a pre-insertion cast (projection) to reconcile 
incoming data columns with the outgoing table schema. Columns are matched by 
position and casts are inserted to reconcile the two column schemas.

When columns aren't correctly aligned, this causes unexpected results. I ran 
into this by not using a correct {{partitionBy}} call (addressed by 
SPARK-14459), which caused an error message that an int could not be cast to an 
array. However, if the columns are vaguely compatible, for example string and 
float, then no error or warning is produced and data is written to the wrong 
columns using unexpected casts (string -> bigint -> float).

A real-world use case that will hit this is when a table definition changes by 
adding a column in the middle of a table. Spark SQL statements that copied from 
that table to a destination table will then map the columns differently but 
insert casts that mask the problem. The last column's data will be dropped 
without a reliable warning for the user.

This highlights a few problems:
* Too many or too few incoming data columns should cause an AnalysisException 
to be thrown
* Only "safe" casts should be inserted automatically, like int -> long, using 
UpCast
* Pre-insertion casts currently ignore extra columns by using zip
* The pre-insertion cast logic differs between Hive's MetastoreRelation and 
LogicalRelation

Also, I think there should be an option to match input data to output columns 
by name. The API allows operations on tables, which hide the column resolution 
problem. It's easy to copy from one table to another without listing the 
columns, and in the API it is common to work with columns by name rather than 
by position. I think the API should add a way to match columns by name, which 
is closer to what users expect. I propose adding something like this:

{code}
CREATE TABLE src (id: bigint, count: int, total: bigint)
CREATE TABLE dst (id: bigint, total: bigint, count: int)

sqlContext.table("src").write.byName.insertInto("dst")
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS Query27

2016-04-11 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-14521:
---
Priority: Blocker  (was: Critical)

> StackOverflowError in Kryo when executing TPC-DS Query27
> 
>
> Key: SPARK-14521
> URL: https://issues.apache.org/jira/browse/SPARK-14521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Rajesh Balamohan
>Priority: Blocker
>
> Build details:  Spark build from master branch (Apr-10)
> DataSet:TPC-DS at 200 GB scale in Parq format stored in hive.
> Client: $SPARK_HOME/bin/beeline 
> Query:  TPC-DS Query27
> spark.sql.sources.fileScan=true (this is the default value anyways)
> Exception:
> {noformat}
> Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS Query27

2016-04-11 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-14521:
---
Affects Version/s: 2.0.0

> StackOverflowError in Kryo when executing TPC-DS Query27
> 
>
> Key: SPARK-14521
> URL: https://issues.apache.org/jira/browse/SPARK-14521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Rajesh Balamohan
>
> Build details:  Spark build from master branch (Apr-10)
> DataSet:TPC-DS at 200 GB scale in Parq format stored in hive.
> Client: $SPARK_HOME/bin/beeline 
> Query:  TPC-DS Query27
> spark.sql.sources.fileScan=true (this is the default value anyways)
> Exception:
> {noformat}
> Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS Query27

2016-04-11 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236094#comment-15236094
 ] 

Josh Rosen commented on SPARK-14521:


Downgrading to Kryo 2 is not an option so we'll have to fix this.

> StackOverflowError in Kryo when executing TPC-DS Query27
> 
>
> Key: SPARK-14521
> URL: https://issues.apache.org/jira/browse/SPARK-14521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Rajesh Balamohan
>
> Build details:  Spark build from master branch (Apr-10)
> DataSet:TPC-DS at 200 GB scale in Parq format stored in hive.
> Client: $SPARK_HOME/bin/beeline 
> Query:  TPC-DS Query27
> spark.sql.sources.fileScan=true (this is the default value anyways)
> Exception:
> {noformat}
> Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS Query27

2016-04-11 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-14521:
---
Priority: Critical  (was: Major)

> StackOverflowError in Kryo when executing TPC-DS Query27
> 
>
> Key: SPARK-14521
> URL: https://issues.apache.org/jira/browse/SPARK-14521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Rajesh Balamohan
>Priority: Critical
>
> Build details:  Spark build from master branch (Apr-10)
> DataSet:TPC-DS at 200 GB scale in Parq format stored in hive.
> Client: $SPARK_HOME/bin/beeline 
> Query:  TPC-DS Query27
> spark.sql.sources.fileScan=true (this is the default value anyways)
> Exception:
> {noformat}
> Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >