[jira] [Updated] (SPARK-14552) ReValue wrapper for SparkR
[ https://issues.apache.org/jira/browse/SPARK-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alok Singh updated SPARK-14552: --- Description: Implement the wrapper for VectorIndexer. The inspiring idea is from dply package in R x <- c("a", "b", "c") revalue(x, c(a = "1", c = "2")) was: Implement the wrapper for VectorIndexer. In R in the dply package one can do the following x <- c("a", "b", "c") revalue(x, c(a = "1", c = "2")) > ReValue wrapper for SparkR > -- > > Key: SPARK-14552 > URL: https://issues.apache.org/jira/browse/SPARK-14552 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR >Reporter: Alok Singh > > Implement the wrapper for VectorIndexer. > The inspiring idea is from dply package in R > x <- c("a", "b", "c") > revalue(x, c(a = "1", c = "2")) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14557) CTAS (save as textfile) doesn't work with pathFilter defined
[ https://issues.apache.org/jira/browse/SPARK-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kashish Jain updated SPARK-14557: - Remaining Estimate: (was: 168h) Original Estimate: (was: 168h) > CTAS (save as textfile) doesn't work with pathFilter defined > > > Key: SPARK-14557 > URL: https://issues.apache.org/jira/browse/SPARK-14557 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.1, 1.3.2, 1.5.2 >Reporter: Kashish Jain > > When the pathFilter is enabled in hive-site.xml, the queries fail on the > table created through CTAS. > Query fired for creating the table > create table CTAS1(field1 int, field2 int) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' STORED AS TEXTFILE as select field1, field2 from > limit 5 > Query which fails - Select * from > Exception Observed - > java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal > character in scheme name at index 10: part-0,hdfs: > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:172) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14557) CTAS (save as textfile) doesn't work with pathFilter enabled
[ https://issues.apache.org/jira/browse/SPARK-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kashish Jain updated SPARK-14557: - Summary: CTAS (save as textfile) doesn't work with pathFilter enabled (was: CTAS (save as textfile) doesn't work with pathFilter defined) > CTAS (save as textfile) doesn't work with pathFilter enabled > > > Key: SPARK-14557 > URL: https://issues.apache.org/jira/browse/SPARK-14557 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.1, 1.3.2, 1.5.2 >Reporter: Kashish Jain > > When the pathFilter is enabled in hive-site.xml, the queries fail on the > table created through CTAS. > Query fired for creating the table > create table CTAS1(field1 int, field2 int) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' STORED AS TEXTFILE as select field1, field2 from > limit 5 > Query which fails - Select * from > Exception Observed - > java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal > character in scheme name at index 10: part-0,hdfs: > at org.apache.hadoop.fs.Path.initialize(Path.java:206) > at org.apache.hadoop.fs.Path.(Path.java:172) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14557) CTAS (save as textfile) doesn't work with pathFilter defined
Kashish Jain created SPARK-14557: Summary: CTAS (save as textfile) doesn't work with pathFilter defined Key: SPARK-14557 URL: https://issues.apache.org/jira/browse/SPARK-14557 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.2, 1.3.1, 1.3.2 Reporter: Kashish Jain When the pathFilter is enabled in hive-site.xml, the queries fail on the table created through CTAS. Query fired for creating the table create table CTAS1(field1 int, field2 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE as select field1, field2 from limit 5 Query which fails - Select * from Exception Observed - java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 10: part-0,hdfs: at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.(Path.java:172) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14535) Remove buildInternalScan from FileFormat
[ https://issues.apache.org/jira/browse/SPARK-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-14535. -- Resolution: Fixed Issue resolved by pull request 12300 [https://github.com/apache/spark/pull/12300] > Remove buildInternalScan from FileFormat > > > Key: SPARK-14535 > URL: https://issues.apache.org/jira/browse/SPARK-14535 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14554) disable whole stage codegen if there are too many input columns
[ https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-14554. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12322 [https://github.com/apache/spark/pull/12322] > disable whole stage codegen if there are too many input columns > --- > > Key: SPARK-14554 > URL: https://issues.apache.org/jira/browse/SPARK-14554 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Critical > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12922) Implement gapply() on DataFrame in SparkR
[ https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236638#comment-15236638 ] Narine Kokhlikyan commented on SPARK-12922: --- [~sunrui], Thank you very much for the explanation! Now I got it! > Implement gapply() on DataFrame in SparkR > - > > Key: SPARK-12922 > URL: https://issues.apache.org/jira/browse/SPARK-12922 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.6.0 >Reporter: Sun Rui > > gapply() applies an R function on groups grouped by one or more columns of a > DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() > in the Dataset API. > Two API styles are supported: > 1. > {code} > gd <- groupBy(df, col1, ...) > gapply(gd, function(grouping_key, group) {}, schema) > {code} > 2. > {code} > gapply(df, grouping_columns, function(grouping_key, group) {}, schema) > {code} > R function input: grouping keys value, a local data.frame of this grouped > data > R function output: local data.frame > Schema specifies the Row format of the output of the R function. It must > match the R function's output. > Note that map-side combination (partial aggregation) is not supported, user > could do map-side combination via dapply(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12922) Implement gapply() on DataFrame in SparkR
[ https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236622#comment-15236622 ] Sun Rui commented on SPARK-12922: - [~Narine] DataFrame and Dataset are now converged. DataFrame is a different view of Dataset, that is Dataset. So groupByKey is the same method for both Dataset and DataFrame, but the `func` is different as the data element view is different, for example: {code} val ds = Seq((1,2), (3,4)).toDS val gd = ds.groupByKey(v=>v._1) val df = ds.toDF val gd1 = df.groupByKey(r=>r.getInt(0)) {code} > Implement gapply() on DataFrame in SparkR > - > > Key: SPARK-12922 > URL: https://issues.apache.org/jira/browse/SPARK-12922 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.6.0 >Reporter: Sun Rui > > gapply() applies an R function on groups grouped by one or more columns of a > DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() > in the Dataset API. > Two API styles are supported: > 1. > {code} > gd <- groupBy(df, col1, ...) > gapply(gd, function(grouping_key, group) {}, schema) > {code} > 2. > {code} > gapply(df, grouping_columns, function(grouping_key, group) {}, schema) > {code} > R function input: grouping keys value, a local data.frame of this grouped > data > R function output: local data.frame > Schema specifies the Row format of the output of the R function. It must > match the R function's output. > Note that map-side combination (partial aggregation) is not supported, user > could do map-side combination via dapply(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14554) disable whole stage codegen if there are too many input columns
[ https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-14554: Summary: disable whole stage codegen if there are too many input columns (was: Dataset.map may generate wrong java code for wide table) > disable whole stage codegen if there are too many input columns > --- > > Key: SPARK-14554 > URL: https://issues.apache.org/jira/browse/SPARK-14554 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14546) Scale Wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236556#comment-15236556 ] Yong Tang commented on SPARK-14546: --- [~aloknsingh] I can work on this one if no one has started yet. Thanks. > Scale Wrapper in SparkR > --- > > Key: SPARK-14546 > URL: https://issues.apache.org/jira/browse/SPARK-14546 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR >Reporter: Alok Singh > > ML has the StandardScaler and that seems like very commonly used. > This jira is to implement the SparkR wrapper for it . > Here is the R scale command > https://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14551) Reduce number of NameNode calls in OrcRelation with FileSourceStrategy mode
[ https://issues.apache.org/jira/browse/SPARK-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated SPARK-14551: - Summary: Reduce number of NameNode calls in OrcRelation with FileSourceStrategy mode (was: Reduce number of NN calls in OrcRelation with FileSourceStrategy mode) > Reduce number of NameNode calls in OrcRelation with FileSourceStrategy mode > --- > > Key: SPARK-14551 > URL: https://issues.apache.org/jira/browse/SPARK-14551 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Rajesh Balamohan >Priority: Minor > > When FileSourceStrategy is used, record reader is created which incurs a NN > call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading > the file information to get the ObjectInspector. This incurs additional NN > call. It would be good to avoid this additional NN call (specifically for > partitioned datasets) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14132) [Table related commands] Alter partition
[ https://issues.apache.org/jira/browse/SPARK-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-14132. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12220 [https://github.com/apache/spark/pull/12220] > [Table related commands] Alter partition > > > Key: SPARK-14132 > URL: https://issues.apache.org/jira/browse/SPARK-14132 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > Fix For: 2.0.0 > > > For alter column command, we have the following tokens. > TOK_ALTERTABLE_ADDPARTS > TOK_ALTERTABLE_DROPPARTS > TOK_MSCK > TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE > For data source tables, we should throw exceptions. > For Hive tables, we should support add and drop partitions. For now, it > should be fine to throw an exception for the rest. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14409) Investigate adding a RankingEvaluator to ML
[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236541#comment-15236541 ] Yong Tang commented on SPARK-14409: --- [~mlnick] [~josephkb] I added a short doc in google driver with comment enabled: https://docs.google.com/document/d/1YEvf5eEm2vRcALJs39yICWmUx6xFW5j8DvXFWbRbStE/edit?usp=sharing Please let me know if there is any feedback. Thanks > Investigate adding a RankingEvaluator to ML > --- > > Key: SPARK-14409 > URL: https://issues.apache.org/jira/browse/SPARK-14409 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Nick Pentreath >Priority: Minor > > {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no > {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful > for recommendation evaluation (and can be useful in other settings > potentially). > Should be thought about in conjunction with adding the "recommendAll" methods > in SPARK-13857, so that top-k ranking metrics can be used in cross-validators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14554) Dataset.map may generate wrong java code for wide table
[ https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14554: Assignee: Wenchen Fan (was: Apache Spark) > Dataset.map may generate wrong java code for wide table > --- > > Key: SPARK-14554 > URL: https://issues.apache.org/jira/browse/SPARK-14554 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14556) Code clean-ups for package o.a.s.sql.execution.streaming.state
[ https://issues.apache.org/jira/browse/SPARK-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14556: Assignee: Apache Spark > Code clean-ups for package o.a.s.sql.execution.streaming.state > -- > > Key: SPARK-14556 > URL: https://issues.apache.org/jira/browse/SPARK-14556 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Liwei Lin >Assignee: Apache Spark >Priority: Minor > > - `StateStoreConf.**max**DeltasForSnapshot` was renamed to > `StateStoreConf.**min**DeltasForSnapshot` > - some state switch checks were added > - improved consistency between method names and string literals -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14554) Dataset.map may generate wrong java code for wide table
[ https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14554: Assignee: Apache Spark (was: Wenchen Fan) > Dataset.map may generate wrong java code for wide table > --- > > Key: SPARK-14554 > URL: https://issues.apache.org/jira/browse/SPARK-14554 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14556) Code clean-ups for package o.a.s.sql.execution.streaming.state
[ https://issues.apache.org/jira/browse/SPARK-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236543#comment-15236543 ] Apache Spark commented on SPARK-14556: -- User 'lw-lin' has created a pull request for this issue: https://github.com/apache/spark/pull/12323 > Code clean-ups for package o.a.s.sql.execution.streaming.state > -- > > Key: SPARK-14556 > URL: https://issues.apache.org/jira/browse/SPARK-14556 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Liwei Lin >Priority: Minor > > - `StateStoreConf.**max**DeltasForSnapshot` was renamed to > `StateStoreConf.**min**DeltasForSnapshot` > - some state switch checks were added > - improved consistency between method names and string literals -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14556) Code clean-ups for package o.a.s.sql.execution.streaming.state
[ https://issues.apache.org/jira/browse/SPARK-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14556: Assignee: (was: Apache Spark) > Code clean-ups for package o.a.s.sql.execution.streaming.state > -- > > Key: SPARK-14556 > URL: https://issues.apache.org/jira/browse/SPARK-14556 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Liwei Lin >Priority: Minor > > - `StateStoreConf.**max**DeltasForSnapshot` was renamed to > `StateStoreConf.**min**DeltasForSnapshot` > - some state switch checks were added > - improved consistency between method names and string literals -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14554) Dataset.map may generate wrong java code for wide table
[ https://issues.apache.org/jira/browse/SPARK-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236544#comment-15236544 ] Apache Spark commented on SPARK-14554: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/12322 > Dataset.map may generate wrong java code for wide table > --- > > Key: SPARK-14554 > URL: https://issues.apache.org/jira/browse/SPARK-14554 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14556) Code clean-ups for package o.a.s.sql.execution.streaming.state
Liwei Lin created SPARK-14556: - Summary: Code clean-ups for package o.a.s.sql.execution.streaming.state Key: SPARK-14556 URL: https://issues.apache.org/jira/browse/SPARK-14556 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Liwei Lin Priority: Minor - `StateStoreConf.**max**DeltasForSnapshot` was renamed to `StateStoreConf.**min**DeltasForSnapshot` - some state switch checks were added - improved consistency between method names and string literals -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14362) DDL Native Support: Drop View
[ https://issues.apache.org/jira/browse/SPARK-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236529#comment-15236529 ] Apache Spark commented on SPARK-14362: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/12321 > DDL Native Support: Drop View > - > > Key: SPARK-14362 > URL: https://issues.apache.org/jira/browse/SPARK-14362 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > Native parsing and native analysis of DDL command: Drop View. > Based on the HIVE DDL document for > [DROP_VIEW_WEB_LINK](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL- > DropView > ), `DROP VIEW` is defined as, > Syntax: > {noformat} > DROP VIEW [IF EXISTS] [db_name.]view_name; > {noformat} > - to remove metadata for the specified view. > - illegal to use DROP TABLE on a view. > - illegal to use DROP VIEW on a table. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14406) Drop Table
[ https://issues.apache.org/jira/browse/SPARK-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236530#comment-15236530 ] Apache Spark commented on SPARK-14406: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/12321 > Drop Table > -- > > Key: SPARK-14406 > URL: https://issues.apache.org/jira/browse/SPARK-14406 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Xiao Li > Fix For: 2.0.0 > > > Right now, DropTable command is in hive module. We should remove the call of > runSqlHive and move it to sql/core. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14555) Python API for methods introduced for Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14555: Assignee: (was: Apache Spark) > Python API for methods introduced for Structured Streaming > -- > > Key: SPARK-14555 > URL: https://issues.apache.org/jira/browse/SPARK-14555 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL, Streaming >Reporter: Burak Yavuz > > Methods added for Structured Streaming don't have a Python API yet. > We need to provide APIs for the new methods in: > - DataFrameReader > - DataFrameWriter > - ContinuousQuery > - Trigger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14555) Python API for methods introduced for Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14555: Assignee: Apache Spark > Python API for methods introduced for Structured Streaming > -- > > Key: SPARK-14555 > URL: https://issues.apache.org/jira/browse/SPARK-14555 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL, Streaming >Reporter: Burak Yavuz >Assignee: Apache Spark > > Methods added for Structured Streaming don't have a Python API yet. > We need to provide APIs for the new methods in: > - DataFrameReader > - DataFrameWriter > - ContinuousQuery > - Trigger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14555) Python API for methods introduced for Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236528#comment-15236528 ] Apache Spark commented on SPARK-14555: -- User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/12320 > Python API for methods introduced for Structured Streaming > -- > > Key: SPARK-14555 > URL: https://issues.apache.org/jira/browse/SPARK-14555 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL, Streaming >Reporter: Burak Yavuz > > Methods added for Structured Streaming don't have a Python API yet. > We need to provide APIs for the new methods in: > - DataFrameReader > - DataFrameWriter > - ContinuousQuery > - Trigger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14555) Python API for methods introduced for Structured Streaming
Burak Yavuz created SPARK-14555: --- Summary: Python API for methods introduced for Structured Streaming Key: SPARK-14555 URL: https://issues.apache.org/jira/browse/SPARK-14555 Project: Spark Issue Type: Sub-task Components: PySpark, SQL, Streaming Reporter: Burak Yavuz Methods added for Structured Streaming don't have a Python API yet. We need to provide APIs for the new methods in: - DataFrameReader - DataFrameWriter - ContinuousQuery - Trigger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14554) Dataset.map may generate wrong java code for wide table
Wenchen Fan created SPARK-14554: --- Summary: Dataset.map may generate wrong java code for wide table Key: SPARK-14554 URL: https://issues.apache.org/jira/browse/SPARK-14554 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11828) DAGScheduler source registered too early with MetricsSystem
[ https://issues.apache.org/jira/browse/SPARK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236510#comment-15236510 ] Dubkov Mikhail commented on SPARK-11828: [~vanzin] Could you please look into http://stackoverflow.com/questions/36133952/why-cant-i-run-spark-shell-with-yarn-in-client-mode/36561486#36561486 We have NPE on line which you added, maybe you have ideas why it happens ? Thanks! > DAGScheduler source registered too early with MetricsSystem > --- > > Key: SPARK-11828 > URL: https://issues.apache.org/jira/browse/SPARK-11828 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Minor > Fix For: 1.6.0 > > > I see this log message when starting apps on YARN: > {quote} > 15/11/18 13:12:56 WARN MetricsSystem: Using default name DAGScheduler for > source because spark.app.id is not set. > {quote} > That's because DAGScheduler registers itself with the metrics system in its > constructor, and the DAGScheduler is instantiated before "spark.app.id" is > set in the context's SparkConf. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14550) OneHotEncoding wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-14550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236497#comment-15236497 ] Alok Singh commented on SPARK-14550: Hi [~mengxr] exposing one hot encoding to the sparkR will help a lot to the R user. So created this jira. Let us know any feedback if any thanks Alok > OneHotEncoding wrapper in SparkR > > > Key: SPARK-14550 > URL: https://issues.apache.org/jira/browse/SPARK-14550 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR >Reporter: Alok Singh > > Implement OneHotEncoding in R. > In R , usually one can use model.matrix to do one hot encoding. which accepts > formula. I think we can support simple formula here. > model.matrix doc: > https://stat.ethz.ch/R-manual/R-devel/library/stats/html/model.matrix.html > here is the example, that would be nice to have > example : > http://stackoverflow.com/questions/16200241/recode-categorical-factor-with-n-categories-into-n-binary-columns -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14553) PCA wrapper for SparkR
[ https://issues.apache.org/jira/browse/SPARK-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236495#comment-15236495 ] Alok Singh commented on SPARK-14553: Hi [~mengxr] , Since all the ml apis are now expanded in 1.6. It would be nice to create more algorithm exposure to the SparkR what do you think? thanks Alok > PCA wrapper for SparkR > -- > > Key: SPARK-14553 > URL: https://issues.apache.org/jira/browse/SPARK-14553 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR >Reporter: Alok Singh > > Implement the SparkR wrapper for the PCA transformer > https://spark.apache.org/docs/latest/ml-features.html#pca > we should support api similar to R i.e > featire<-prcomp(df, > center = TRUE, > scale. = TRUE) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14553) PCA wrapper for SparkR
[ https://issues.apache.org/jira/browse/SPARK-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alok Singh updated SPARK-14553: --- Description: Implement the SparkR wrapper for the PCA transformer https://spark.apache.org/docs/latest/ml-features.html#pca we should support api similar to R i.e featire<-prcomp(df, center = TRUE, scale. = TRUE) was: Implement the SparkR wrapper for the PCA transformer https://spark.apache.org/docs/latest/ml-features.html#pca we should support api similar to R i.e prcomp(log.ir, center = TRUE, scale. = TRUE) > PCA wrapper for SparkR > -- > > Key: SPARK-14553 > URL: https://issues.apache.org/jira/browse/SPARK-14553 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR >Reporter: Alok Singh > > Implement the SparkR wrapper for the PCA transformer > https://spark.apache.org/docs/latest/ml-features.html#pca > we should support api similar to R i.e > featire<-prcomp(df, > center = TRUE, > scale. = TRUE) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14553) PCA wrapper for SparkR
Alok Singh created SPARK-14553: -- Summary: PCA wrapper for SparkR Key: SPARK-14553 URL: https://issues.apache.org/jira/browse/SPARK-14553 Project: Spark Issue Type: New Feature Components: ML, SparkR Reporter: Alok Singh Implement the SparkR wrapper for the PCA transformer https://spark.apache.org/docs/latest/ml-features.html#pca we should support api similar to R i.e prcomp(log.ir, center = TRUE, scale. = TRUE) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14552) ReValue wrapper for SparkR
[ https://issues.apache.org/jira/browse/SPARK-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alok Singh updated SPARK-14552: --- Description: Implement the wrapper for VectorIndexer. In R in the dply package one can do the following x <- c("a", "b", "c") revalue(x, c(a = "1", c = "2")) was: Implement the wrapper for VectorIndexer. In R in the dply package one can do the following x <- c("a", "b", "c") revalue(x, c(a = "1", c = "2")) > ReValue wrapper for SparkR > -- > > Key: SPARK-14552 > URL: https://issues.apache.org/jira/browse/SPARK-14552 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR >Reporter: Alok Singh > > Implement the wrapper for VectorIndexer. > In R in the dply package one can do the following > x <- c("a", "b", "c") > revalue(x, c(a = "1", c = "2")) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14548) Support !> and !< operator in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236487#comment-15236487 ] Sean Owen commented on SPARK-14548: --- It says these are non-standard though. I can understand supporting them to let some legacy SQL query run but is it realistic to expect this is the only such issue? it doesn't seem worth supporting for its own sake as it's confusing. > Support !> and !< operator in Spark SQL > --- > > Key: SPARK-14548 > URL: https://issues.apache.org/jira/browse/SPARK-14548 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jia Li >Priority: Minor > > !< means not less than which is equivalent to >= > !> means not greater than which is equivalent to <= > I'd to create a PR to support these two operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14552) ReValue wrapper for SparkR
Alok Singh created SPARK-14552: -- Summary: ReValue wrapper for SparkR Key: SPARK-14552 URL: https://issues.apache.org/jira/browse/SPARK-14552 Project: Spark Issue Type: New Feature Components: ML, SparkR Reporter: Alok Singh Implement the wrapper for VectorIndexer. In R in the dply package one can do the following x <- c("a", "b", "c") revalue(x, c(a = "1", c = "2")) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14548) Support !> and !< operator in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236461#comment-15236461 ] Xiao Li edited comment on SPARK-14548 at 4/12/16 2:42 AM: -- MS SQL Server supports these operators. https://msdn.microsoft.com/en-us/library/ms188074.aspx was (Author: smilegator): https://msdn.microsoft.com/en-us/library/ms188074.aspx > Support !> and !< operator in Spark SQL > --- > > Key: SPARK-14548 > URL: https://issues.apache.org/jira/browse/SPARK-14548 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jia Li >Priority: Minor > > !< means not less than which is equivalent to >= > !> means not greater than which is equivalent to <= > I'd to create a PR to support these two operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12922) Implement gapply() on DataFrame in SparkR
[ https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236484#comment-15236484 ] Narine Kokhlikyan commented on SPARK-12922: --- Thanks for the quick response, [~sunrui]. I was playing with KeyValueGroupedDataset and have noticed that it works only for Datasets. When I try groupByKey for a DataFrame, it fails. This succeeds: val grouped = ds.groupByKey(v => (v._1, "word")) But the following fails: val grouped = df.groupByKey(v => (v._1, "word")) As far as I know in SparkR we are working with DataFrames, so this means that I need to convert the DataFrame to Dataset and work on Datasets on scala side ?! Thanks, Narine > Implement gapply() on DataFrame in SparkR > - > > Key: SPARK-12922 > URL: https://issues.apache.org/jira/browse/SPARK-12922 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.6.0 >Reporter: Sun Rui > > gapply() applies an R function on groups grouped by one or more columns of a > DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() > in the Dataset API. > Two API styles are supported: > 1. > {code} > gd <- groupBy(df, col1, ...) > gapply(gd, function(grouping_key, group) {}, schema) > {code} > 2. > {code} > gapply(df, grouping_columns, function(grouping_key, group) {}, schema) > {code} > R function input: grouping keys value, a local data.frame of this grouped > data > R function output: local data.frame > Schema specifies the Row format of the output of the R function. It must > match the R function's output. > Note that map-side combination (partial aggregation) is not supported, user > could do map-side combination via dapply(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14551) Reduce number of NN calls in OrcRelation with FileSourceStrategy mode
[ https://issues.apache.org/jira/browse/SPARK-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14551: Assignee: (was: Apache Spark) > Reduce number of NN calls in OrcRelation with FileSourceStrategy mode > - > > Key: SPARK-14551 > URL: https://issues.apache.org/jira/browse/SPARK-14551 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Rajesh Balamohan >Priority: Minor > > When FileSourceStrategy is used, record reader is created which incurs a NN > call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading > the file information to get the ObjectInspector. This incurs additional NN > call. It would be good to avoid this additional NN call (specifically for > partitioned datasets) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14551) Reduce number of NN calls in OrcRelation with FileSourceStrategy mode
[ https://issues.apache.org/jira/browse/SPARK-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14551: Assignee: Apache Spark > Reduce number of NN calls in OrcRelation with FileSourceStrategy mode > - > > Key: SPARK-14551 > URL: https://issues.apache.org/jira/browse/SPARK-14551 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Rajesh Balamohan >Assignee: Apache Spark >Priority: Minor > > When FileSourceStrategy is used, record reader is created which incurs a NN > call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading > the file information to get the ObjectInspector. This incurs additional NN > call. It would be good to avoid this additional NN call (specifically for > partitioned datasets) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14551) Reduce number of NN calls in OrcRelation with FileSourceStrategy mode
[ https://issues.apache.org/jira/browse/SPARK-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236478#comment-15236478 ] Apache Spark commented on SPARK-14551: -- User 'rajeshbalamohan' has created a pull request for this issue: https://github.com/apache/spark/pull/12319 > Reduce number of NN calls in OrcRelation with FileSourceStrategy mode > - > > Key: SPARK-14551 > URL: https://issues.apache.org/jira/browse/SPARK-14551 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Rajesh Balamohan >Priority: Minor > > When FileSourceStrategy is used, record reader is created which incurs a NN > call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading > the file information to get the ObjectInspector. This incurs additional NN > call. It would be good to avoid this additional NN call (specifically for > partitioned datasets) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14548) Support !> and !< operator in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236461#comment-15236461 ] Xiao Li commented on SPARK-14548: - https://msdn.microsoft.com/en-us/library/ms188074.aspx > Support !> and !< operator in Spark SQL > --- > > Key: SPARK-14548 > URL: https://issues.apache.org/jira/browse/SPARK-14548 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jia Li >Priority: Minor > > !< means not less than which is equivalent to >= > !> means not greater than which is equivalent to <= > I'd to create a PR to support these two operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14513) Threads left behind after stopping SparkContext
[ https://issues.apache.org/jira/browse/SPARK-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14513: Assignee: Apache Spark > Threads left behind after stopping SparkContext > --- > > Key: SPARK-14513 > URL: https://issues.apache.org/jira/browse/SPARK-14513 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Terence Yim >Assignee: Apache Spark > > After {{SparkContext}} is stopped, there are couple threads that are left > behind. After some digging it is caused by couple bugs: > 1. The {{HttpBasedFileServer.shutdown()}} is not getting called during > {{NettyRpcEnv.shutdown()}}, hence a thread is left and block on the > {{ServerSocket.accept()}} from the underlying Jetty {{Server}}. > 2. {{QueuedThreadPool}} created in the {{HttpServer}} and through the > {{JettyUtils.startJettyServer}} method are never getting stopped. This is due > to the fact that thread pool used by Jetty {{Server}} won't get closed > automatically when the {{Server}} is stopped. > I'll send out a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14513) Threads left behind after stopping SparkContext
[ https://issues.apache.org/jira/browse/SPARK-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14513: Assignee: (was: Apache Spark) > Threads left behind after stopping SparkContext > --- > > Key: SPARK-14513 > URL: https://issues.apache.org/jira/browse/SPARK-14513 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Terence Yim > > After {{SparkContext}} is stopped, there are couple threads that are left > behind. After some digging it is caused by couple bugs: > 1. The {{HttpBasedFileServer.shutdown()}} is not getting called during > {{NettyRpcEnv.shutdown()}}, hence a thread is left and block on the > {{ServerSocket.accept()}} from the underlying Jetty {{Server}}. > 2. {{QueuedThreadPool}} created in the {{HttpServer}} and through the > {{JettyUtils.startJettyServer}} method are never getting stopped. This is due > to the fact that thread pool used by Jetty {{Server}} won't get closed > automatically when the {{Server}} is stopped. > I'll send out a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14513) Threads left behind after stopping SparkContext
[ https://issues.apache.org/jira/browse/SPARK-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236455#comment-15236455 ] Apache Spark commented on SPARK-14513: -- User 'chtyim' has created a pull request for this issue: https://github.com/apache/spark/pull/12318 > Threads left behind after stopping SparkContext > --- > > Key: SPARK-14513 > URL: https://issues.apache.org/jira/browse/SPARK-14513 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Terence Yim > > After {{SparkContext}} is stopped, there are couple threads that are left > behind. After some digging it is caused by couple bugs: > 1. The {{HttpBasedFileServer.shutdown()}} is not getting called during > {{NettyRpcEnv.shutdown()}}, hence a thread is left and block on the > {{ServerSocket.accept()}} from the underlying Jetty {{Server}}. > 2. {{QueuedThreadPool}} created in the {{HttpServer}} and through the > {{JettyUtils.startJettyServer}} method are never getting stopped. This is due > to the fact that thread pool used by Jetty {{Server}} won't get closed > automatically when the {{Server}} is stopped. > I'll send out a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14551) Reduce number of NN calls in OrcRelation with FileSourceStrategy mode
Rajesh Balamohan created SPARK-14551: Summary: Reduce number of NN calls in OrcRelation with FileSourceStrategy mode Key: SPARK-14551 URL: https://issues.apache.org/jira/browse/SPARK-14551 Project: Spark Issue Type: Improvement Components: SQL Reporter: Rajesh Balamohan Priority: Minor When FileSourceStrategy is used, record reader is created which incurs a NN call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading the file information to get the ObjectInspector. This incurs additional NN call. It would be good to avoid this additional NN call (specifically for partitioned datasets) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14548) Support !> and !< operator in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236449#comment-15236449 ] Sean Owen commented on SPARK-14548: --- I've honestly never heard of these operators in any language. Does something support this syntax? Why would I write !> instead of the much more familiar <= ? > Support !> and !< operator in Spark SQL > --- > > Key: SPARK-14548 > URL: https://issues.apache.org/jira/browse/SPARK-14548 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jia Li >Priority: Minor > > !< means not less than which is equivalent to >= > !> means not greater than which is equivalent to <= > I'd to create a PR to support these two operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14520) ClasscastException thrown with spark.sql.parquet.enableVectorizedReader=true
[ https://issues.apache.org/jira/browse/SPARK-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14520. - Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 2.0.0 > ClasscastException thrown with spark.sql.parquet.enableVectorizedReader=true > > > Key: SPARK-14520 > URL: https://issues.apache.org/jira/browse/SPARK-14520 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Rajesh Balamohan >Assignee: Liang-Chi Hsieh > Fix For: 2.0.0 > > > Build details: Spark build from master branch (Apr-10) > TPC-DS at 200 GB scale stored in Parq format stored in hive. > Ran TPC-DS Query27 via Spark beeline client with > "spark.sql.sources.fileScan=false". > {noformat} > java.lang.ClassCastException: > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader > cannot be cast to org.apache.parquet.hadoop.ParquetRecordReader > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetInputFormat.createRecordReader(ParquetRelation.scala:480) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetInputFormat.createRecordReader(ParquetRelation.scala:476) > at > org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.(SqlNewHadoopRDD.scala:161) > at > org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:121) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69) > at org.apache.spark.scheduler.Task.run(Task.scala:82) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:231) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Creating this JIRA as a placeholder to track this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14549) Copy the Vector and Matrix classes from mllib to ml in mllib-local
[ https://issues.apache.org/jira/browse/SPARK-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236429#comment-15236429 ] Apache Spark commented on SPARK-14549: -- User 'dbtsai' has created a pull request for this issue: https://github.com/apache/spark/pull/12317 > Copy the Vector and Matrix classes from mllib to ml in mllib-local > -- > > Key: SPARK-14549 > URL: https://issues.apache.org/jira/browse/SPARK-14549 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: DB Tsai >Assignee: DB Tsai > > This task will copy the Vector and Matrix classes from mllib to ml package in > mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will > be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will > be replaced by /* @ since 1.2.0 */ > The BLAS implementation will be copied, and some of the test utilities will > be copies as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14549) Copy the Vector and Matrix classes from mllib to ml in mllib-local
[ https://issues.apache.org/jira/browse/SPARK-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14549: Assignee: DB Tsai (was: Apache Spark) > Copy the Vector and Matrix classes from mllib to ml in mllib-local > -- > > Key: SPARK-14549 > URL: https://issues.apache.org/jira/browse/SPARK-14549 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: DB Tsai >Assignee: DB Tsai > > This task will copy the Vector and Matrix classes from mllib to ml package in > mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will > be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will > be replaced by /* @ since 1.2.0 */ > The BLAS implementation will be copied, and some of the test utilities will > be copies as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14549) Copy the Vector and Matrix classes from mllib to ml in mllib-local
[ https://issues.apache.org/jira/browse/SPARK-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14549: Assignee: Apache Spark (was: DB Tsai) > Copy the Vector and Matrix classes from mllib to ml in mllib-local > -- > > Key: SPARK-14549 > URL: https://issues.apache.org/jira/browse/SPARK-14549 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: DB Tsai >Assignee: Apache Spark > > This task will copy the Vector and Matrix classes from mllib to ml package in > mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will > be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will > be replaced by /* @ since 1.2.0 */ > The BLAS implementation will be copied, and some of the test utilities will > be copies as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14550) OneHotEncoding wrapper in SparkR
Alok Singh created SPARK-14550: -- Summary: OneHotEncoding wrapper in SparkR Key: SPARK-14550 URL: https://issues.apache.org/jira/browse/SPARK-14550 Project: Spark Issue Type: New Feature Components: ML, SparkR Reporter: Alok Singh Implement OneHotEncoding in R. In R , usually one can use model.matrix to do one hot encoding. which accepts formula. I think we can support simple formula here. model.matrix doc: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/model.matrix.html here is the example, that would be nice to have example : http://stackoverflow.com/questions/16200241/recode-categorical-factor-with-n-categories-into-n-binary-columns -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13352) BlockFetch does not scale well on large block
[ https://issues.apache.org/jira/browse/SPARK-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236426#comment-15236426 ] Zhang, Liye commented on SPARK-13352: - [~davies], the last result for 500M should be 7.8 seconds, not 7.8 min, right? > BlockFetch does not scale well on large block > - > > Key: SPARK-13352 > URL: https://issues.apache.org/jira/browse/SPARK-13352 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Reporter: Davies Liu >Assignee: Zhang, Liye >Priority: Critical > Fix For: 1.6.2, 2.0.0 > > > BlockManager.getRemoteBytes() perform poorly on large block > {code} > test("block manager") { > val N = 500 << 20 > val bm = sc.env.blockManager > val blockId = TaskResultBlockId(0) > val buffer = ByteBuffer.allocate(N) > buffer.limit(N) > bm.putBytes(blockId, buffer, StorageLevel.MEMORY_AND_DISK_SER) > val result = bm.getRemoteBytes(blockId) > assert(result.isDefined) > assert(result.get.limit() === (N)) > } > {code} > Here are runtime for different block sizes: > {code} > 50M3 seconds > 100M 7 seconds > 250M 33 seconds > 500M 2 min > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14549) Copy the Vector and Matrix classes from mllib to ml in mllib-local
DB Tsai created SPARK-14549: --- Summary: Copy the Vector and Matrix classes from mllib to ml in mllib-local Key: SPARK-14549 URL: https://issues.apache.org/jira/browse/SPARK-14549 Project: Spark Issue Type: Sub-task Components: MLlib Reporter: DB Tsai Assignee: DB Tsai This task will copy the Vector and Matrix classes from mllib to ml package in mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will be replaced by /* @ since 1.2.0 */ The BLAS implementation will be copied, and some of the test utilities will be copies as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14548) Support !> and !< operator in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236416#comment-15236416 ] Apache Spark commented on SPARK-14548: -- User 'jliwork' has created a pull request for this issue: https://github.com/apache/spark/pull/12316 > Support !> and !< operator in Spark SQL > --- > > Key: SPARK-14548 > URL: https://issues.apache.org/jira/browse/SPARK-14548 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jia Li >Priority: Minor > > !< means not less than which is equivalent to >= > !> means not greater than which is equivalent to <= > I'd to create a PR to support these two operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14548) Support !> and !< operator in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14548: Assignee: (was: Apache Spark) > Support !> and !< operator in Spark SQL > --- > > Key: SPARK-14548 > URL: https://issues.apache.org/jira/browse/SPARK-14548 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jia Li >Priority: Minor > > !< means not less than which is equivalent to >= > !> means not greater than which is equivalent to <= > I'd to create a PR to support these two operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14548) Support !> and !< operator in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14548: Assignee: Apache Spark > Support !> and !< operator in Spark SQL > --- > > Key: SPARK-14548 > URL: https://issues.apache.org/jira/browse/SPARK-14548 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Jia Li >Assignee: Apache Spark >Priority: Minor > > !< means not less than which is equivalent to >= > !> means not greater than which is equivalent to <= > I'd to create a PR to support these two operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14475) Propagate user-defined context from driver to executors
[ https://issues.apache.org/jira/browse/SPARK-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14475. - Resolution: Fixed Assignee: Eric Liang Fix Version/s: 2.0.0 > Propagate user-defined context from driver to executors > --- > > Key: SPARK-14475 > URL: https://issues.apache.org/jira/browse/SPARK-14475 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Eric Liang >Assignee: Eric Liang > Fix For: 2.0.0 > > > It would be useful (e.g. for tracing) to automatically propagate arbitrary > user defined context (i.e. thread-locals) from the driver to executors. We > can do this easily by adding sc.localProperties to TaskContext. > cc [~joshrosen] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14547) Avoid DNS resolution for reusing connections
[ https://issues.apache.org/jira/browse/SPARK-14547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14547: Assignee: Reynold Xin (was: Apache Spark) > Avoid DNS resolution for reusing connections > > > Key: SPARK-14547 > URL: https://issues.apache.org/jira/browse/SPARK-14547 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14547) Avoid DNS resolution for reusing connections
[ https://issues.apache.org/jira/browse/SPARK-14547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14547: Assignee: Apache Spark (was: Reynold Xin) > Avoid DNS resolution for reusing connections > > > Key: SPARK-14547 > URL: https://issues.apache.org/jira/browse/SPARK-14547 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14547) Avoid DNS resolution for reusing connections
[ https://issues.apache.org/jira/browse/SPARK-14547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236402#comment-15236402 ] Apache Spark commented on SPARK-14547: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/12315 > Avoid DNS resolution for reusing connections > > > Key: SPARK-14547 > URL: https://issues.apache.org/jira/browse/SPARK-14547 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14548) Support !> and !< operator in Spark SQL
Jia Li created SPARK-14548: -- Summary: Support !> and !< operator in Spark SQL Key: SPARK-14548 URL: https://issues.apache.org/jira/browse/SPARK-14548 Project: Spark Issue Type: Improvement Components: SQL Reporter: Jia Li Priority: Minor !< means not less than which is equivalent to >= !> means not greater than which is equivalent to <= I'd to create a PR to support these two operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14547) Avoid DNS resolution for reusing connections
Reynold Xin created SPARK-14547: --- Summary: Avoid DNS resolution for reusing connections Key: SPARK-14547 URL: https://issues.apache.org/jira/browse/SPARK-14547 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14546) Scale Wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alok Singh updated SPARK-14546: --- Summary: Scale Wrapper in SparkR (was: Scale Wrapper ) > Scale Wrapper in SparkR > --- > > Key: SPARK-14546 > URL: https://issues.apache.org/jira/browse/SPARK-14546 > Project: Spark > Issue Type: New Feature > Components: ML, SparkR >Reporter: Alok Singh > > ML has the StandardScaler and that seems like very commonly used. > This jira is to implement the SparkR wrapper for it . > Here is the R scale command > https://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14546) Scale Wrapper
Alok Singh created SPARK-14546: -- Summary: Scale Wrapper Key: SPARK-14546 URL: https://issues.apache.org/jira/browse/SPARK-14546 Project: Spark Issue Type: New Feature Components: ML, SparkR Reporter: Alok Singh ML has the StandardScaler and that seems like very commonly used. This jira is to implement the SparkR wrapper for it . Here is the R scale command https://stat.ethz.ch/R-manual/R-devel/library/base/html/scale.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14441) Consolidate DDL tests
[ https://issues.apache.org/jira/browse/SPARK-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236359#comment-15236359 ] Bo Meng commented on SPARK-14441: - I think DDLSuite and DDLCommandSuite can be combined into one, also can HiveDDLSuite and HiveDDLCommandSuite, since they are just testing the different stage. If you agree, I will make the changes. > Consolidate DDL tests > - > > Key: SPARK-14441 > URL: https://issues.apache.org/jira/browse/SPARK-14441 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 2.0.0 >Reporter: Andrew Or > > Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing > whether a test should exist in one or the other. It also makes it less clear > whether our test coverage is comprehensive. Ideally we should consolidate > these files as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14126) [Table related commands] Truncate table
[ https://issues.apache.org/jira/browse/SPARK-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236353#comment-15236353 ] Adrian Wang commented on SPARK-14126: - I'm working on this. > [Table related commands] Truncate table > --- > > Key: SPARK-14126 > URL: https://issues.apache.org/jira/browse/SPARK-14126 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > TOK_TRUNCATETABLE > We also need to check the behavior of Hive when we call truncate table on a > partitioned table. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14414) Make error messages consistent across DDLs
[ https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236336#comment-15236336 ] Apache Spark commented on SPARK-14414: -- User 'bomeng' has created a pull request for this issue: https://github.com/apache/spark/pull/12314 > Make error messages consistent across DDLs > -- > > Key: SPARK-14414 > URL: https://issues.apache.org/jira/browse/SPARK-14414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > There are many different error messages right now when the user tries to run > something that's not supported. We might throw AnalysisException or > ParseException or NoSuchFunctionException etc. We should make all of these > consistent before 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14414) Make error messages consistent across DDLs
[ https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14414: Assignee: Andrew Or (was: Apache Spark) > Make error messages consistent across DDLs > -- > > Key: SPARK-14414 > URL: https://issues.apache.org/jira/browse/SPARK-14414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > There are many different error messages right now when the user tries to run > something that's not supported. We might throw AnalysisException or > ParseException or NoSuchFunctionException etc. We should make all of these > consistent before 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14414) Make error messages consistent across DDLs
[ https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14414: Assignee: Apache Spark (was: Andrew Or) > Make error messages consistent across DDLs > -- > > Key: SPARK-14414 > URL: https://issues.apache.org/jira/browse/SPARK-14414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Apache Spark > > There are many different error messages right now when the user tries to run > something that's not supported. We might throw AnalysisException or > ParseException or NoSuchFunctionException etc. We should make all of these > consistent before 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14543) SQL/Hive insertInto has unexpected results
[ https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14543: Assignee: (was: Apache Spark) > SQL/Hive insertInto has unexpected results > -- > > Key: SPARK-14543 > URL: https://issues.apache.org/jira/browse/SPARK-14543 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ryan Blue > > The Hive write path adds a pre-insertion cast (projection) to reconcile > incoming data columns with the outgoing table schema. Columns are matched by > position and casts are inserted to reconcile the two column schemas. > When columns aren't correctly aligned, this causes unexpected results. I ran > into this by not using a correct {{partitionBy}} call (addressed by > SPARK-14459), which caused an error message that an int could not be cast to > an array. However, if the columns are vaguely compatible, for example string > and float, then no error or warning is produced and data is written to the > wrong columns using unexpected casts (string -> bigint -> float). > A real-world use case that will hit this is when a table definition changes > by adding a column in the middle of a table. Spark SQL statements that copied > from that table to a destination table will then map the columns differently > but insert casts that mask the problem. The last column's data will be > dropped without a reliable warning for the user. > This highlights a few problems: > * Too many or too few incoming data columns should cause an AnalysisException > to be thrown > * Only "safe" casts should be inserted automatically, like int -> long, using > UpCast > * Pre-insertion casts currently ignore extra columns by using zip > * The pre-insertion cast logic differs between Hive's MetastoreRelation and > LogicalRelation > Also, I think there should be an option to match input data to output columns > by name. The API allows operations on tables, which hide the column > resolution problem. It's easy to copy from one table to another without > listing the columns, and in the API it is common to work with columns by name > rather than by position. I think the API should add a way to match columns by > name, which is closer to what users expect. I propose adding something like > this: > {code} > CREATE TABLE src (id: bigint, count: int, total: bigint) > CREATE TABLE dst (id: bigint, total: bigint, count: int) > sqlContext.table("src").write.byName.insertInto("dst") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14543) SQL/Hive insertInto has unexpected results
[ https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14543: Assignee: Apache Spark > SQL/Hive insertInto has unexpected results > -- > > Key: SPARK-14543 > URL: https://issues.apache.org/jira/browse/SPARK-14543 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ryan Blue >Assignee: Apache Spark > > The Hive write path adds a pre-insertion cast (projection) to reconcile > incoming data columns with the outgoing table schema. Columns are matched by > position and casts are inserted to reconcile the two column schemas. > When columns aren't correctly aligned, this causes unexpected results. I ran > into this by not using a correct {{partitionBy}} call (addressed by > SPARK-14459), which caused an error message that an int could not be cast to > an array. However, if the columns are vaguely compatible, for example string > and float, then no error or warning is produced and data is written to the > wrong columns using unexpected casts (string -> bigint -> float). > A real-world use case that will hit this is when a table definition changes > by adding a column in the middle of a table. Spark SQL statements that copied > from that table to a destination table will then map the columns differently > but insert casts that mask the problem. The last column's data will be > dropped without a reliable warning for the user. > This highlights a few problems: > * Too many or too few incoming data columns should cause an AnalysisException > to be thrown > * Only "safe" casts should be inserted automatically, like int -> long, using > UpCast > * Pre-insertion casts currently ignore extra columns by using zip > * The pre-insertion cast logic differs between Hive's MetastoreRelation and > LogicalRelation > Also, I think there should be an option to match input data to output columns > by name. The API allows operations on tables, which hide the column > resolution problem. It's easy to copy from one table to another without > listing the columns, and in the API it is common to work with columns by name > rather than by position. I think the API should add a way to match columns by > name, which is closer to what users expect. I propose adding something like > this: > {code} > CREATE TABLE src (id: bigint, count: int, total: bigint) > CREATE TABLE dst (id: bigint, total: bigint, count: int) > sqlContext.table("src").write.byName.insertInto("dst") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14543) SQL/Hive insertInto has unexpected results
[ https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236315#comment-15236315 ] Apache Spark commented on SPARK-14543: -- User 'rdblue' has created a pull request for this issue: https://github.com/apache/spark/pull/12313 > SQL/Hive insertInto has unexpected results > -- > > Key: SPARK-14543 > URL: https://issues.apache.org/jira/browse/SPARK-14543 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ryan Blue > > The Hive write path adds a pre-insertion cast (projection) to reconcile > incoming data columns with the outgoing table schema. Columns are matched by > position and casts are inserted to reconcile the two column schemas. > When columns aren't correctly aligned, this causes unexpected results. I ran > into this by not using a correct {{partitionBy}} call (addressed by > SPARK-14459), which caused an error message that an int could not be cast to > an array. However, if the columns are vaguely compatible, for example string > and float, then no error or warning is produced and data is written to the > wrong columns using unexpected casts (string -> bigint -> float). > A real-world use case that will hit this is when a table definition changes > by adding a column in the middle of a table. Spark SQL statements that copied > from that table to a destination table will then map the columns differently > but insert casts that mask the problem. The last column's data will be > dropped without a reliable warning for the user. > This highlights a few problems: > * Too many or too few incoming data columns should cause an AnalysisException > to be thrown > * Only "safe" casts should be inserted automatically, like int -> long, using > UpCast > * Pre-insertion casts currently ignore extra columns by using zip > * The pre-insertion cast logic differs between Hive's MetastoreRelation and > LogicalRelation > Also, I think there should be an option to match input data to output columns > by name. The API allows operations on tables, which hide the column > resolution problem. It's easy to copy from one table to another without > listing the columns, and in the API it is common to work with columns by name > rather than by position. I think the API should add a way to match columns by > name, which is closer to what users expect. I propose adding something like > this: > {code} > CREATE TABLE src (id: bigint, count: int, total: bigint) > CREATE TABLE dst (id: bigint, total: bigint, count: int) > sqlContext.table("src").write.byName.insertInto("dst") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule
[ https://issues.apache.org/jira/browse/SPARK-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-14545: -- Description: Current `LikeSimplification` handles the following four rules. - 'a%' => expr.StartsWith("a") - '%b' => expr.EndsWith("b") - '%a%' => expr.Contains("a") - 'a' => EqualTo("a") This issue adds the following rule. - 'a%b' => expr.Length() >= 2 && expr.StartsWith("a") && expr.EndsWith("b") Here, 2 is statically calculated from "a".size + "b".size. was: Current `LikeSimplification` handles the following four rules. - 'a%' => expr.StartsWith("a") - '%b' => expr.EndsWith("b") - '%a%' => expr.Contains("a") - 'a' => EqualTo("a") This issue adds the following rule. - 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b") Here, 2 is statically calculated from "a".size + "b".size. > Improve `LikeSimplification` by adding `a%b` rule > - > > Key: SPARK-14545 > URL: https://issues.apache.org/jira/browse/SPARK-14545 > Project: Spark > Issue Type: New Feature > Components: Optimizer >Reporter: Dongjoon Hyun > > Current `LikeSimplification` handles the following four rules. > - 'a%' => expr.StartsWith("a") > - '%b' => expr.EndsWith("b") > - '%a%' => expr.Contains("a") > - 'a' => EqualTo("a") > This issue adds the following rule. > - 'a%b' => expr.Length() >= 2 && expr.StartsWith("a") && expr.EndsWith("b") > Here, 2 is statically calculated from "a".size + "b".size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule
[ https://issues.apache.org/jira/browse/SPARK-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14545: Assignee: (was: Apache Spark) > Improve `LikeSimplification` by adding `a%b` rule > - > > Key: SPARK-14545 > URL: https://issues.apache.org/jira/browse/SPARK-14545 > Project: Spark > Issue Type: New Feature > Components: Optimizer >Reporter: Dongjoon Hyun > > Current `LikeSimplification` handles the following four rules. > - 'a%' => expr.StartsWith("a") > - '%b' => expr.EndsWith("b") > - '%a%' => expr.Contains("a") > - 'a' => EqualTo("a") > This issue adds the following rule. > - 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b") > Here, 2 is statically calculated from "a".size + "b".size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule
[ https://issues.apache.org/jira/browse/SPARK-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14545: Assignee: Apache Spark > Improve `LikeSimplification` by adding `a%b` rule > - > > Key: SPARK-14545 > URL: https://issues.apache.org/jira/browse/SPARK-14545 > Project: Spark > Issue Type: New Feature > Components: Optimizer >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > Current `LikeSimplification` handles the following four rules. > - 'a%' => expr.StartsWith("a") > - '%b' => expr.EndsWith("b") > - '%a%' => expr.Contains("a") > - 'a' => EqualTo("a") > This issue adds the following rule. > - 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b") > Here, 2 is statically calculated from "a".size + "b".size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule
[ https://issues.apache.org/jira/browse/SPARK-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236293#comment-15236293 ] Apache Spark commented on SPARK-14545: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/12312 > Improve `LikeSimplification` by adding `a%b` rule > - > > Key: SPARK-14545 > URL: https://issues.apache.org/jira/browse/SPARK-14545 > Project: Spark > Issue Type: New Feature > Components: Optimizer >Reporter: Dongjoon Hyun > > Current `LikeSimplification` handles the following four rules. > - 'a%' => expr.StartsWith("a") > - '%b' => expr.EndsWith("b") > - '%a%' => expr.Contains("a") > - 'a' => EqualTo("a") > This issue adds the following rule. > - 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b") > Here, 2 is statically calculated from "a".size + "b".size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14545) Improve `LikeSimplification` by adding `a%b` rule
Dongjoon Hyun created SPARK-14545: - Summary: Improve `LikeSimplification` by adding `a%b` rule Key: SPARK-14545 URL: https://issues.apache.org/jira/browse/SPARK-14545 Project: Spark Issue Type: New Feature Components: Optimizer Reporter: Dongjoon Hyun Current `LikeSimplification` handles the following four rules. - 'a%' => expr.StartsWith("a") - '%b' => expr.EndsWith("b") - '%a%' => expr.Contains("a") - 'a' => EqualTo("a") This issue adds the following rule. - 'a%b' => expr.Length() > 2 && expr.StartsWith("a") && expr.EndsWith("b") Here, 2 is statically calculated from "a".size + "b".size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14544) Spark UI is very slow in recent Chrome
[ https://issues.apache.org/jira/browse/SPARK-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14544: Assignee: Davies Liu (was: Apache Spark) > Spark UI is very slow in recent Chrome > -- > > Key: SPARK-14544 > URL: https://issues.apache.org/jira/browse/SPARK-14544 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > > Once I ran an complicated query or there were many query in the SQL tab, the > page is really really slow in Chrome 49, but fast in Safari/Firefox. > Given that the fact that many users are using Chrome, so we should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14544) Spark UI is very slow in recent Chrome
[ https://issues.apache.org/jira/browse/SPARK-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236211#comment-15236211 ] Apache Spark commented on SPARK-14544: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/12311 > Spark UI is very slow in recent Chrome > -- > > Key: SPARK-14544 > URL: https://issues.apache.org/jira/browse/SPARK-14544 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > > Once I ran an complicated query or there were many query in the SQL tab, the > page is really really slow in Chrome 49, but fast in Safari/Firefox. > Given that the fact that many users are using Chrome, so we should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14544) Spark UI is very slow in recent Chrome
[ https://issues.apache.org/jira/browse/SPARK-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14544: Assignee: Apache Spark (was: Davies Liu) > Spark UI is very slow in recent Chrome > -- > > Key: SPARK-14544 > URL: https://issues.apache.org/jira/browse/SPARK-14544 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > > Once I ran an complicated query or there were many query in the SQL tab, the > page is really really slow in Chrome 49, but fast in Safari/Firefox. > Given that the fact that many users are using Chrome, so we should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support
[ https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10521: --- Assignee: Luciano Resende > Utilize Docker to test DB2 JDBC Dialect support > --- > > Key: SPARK-10521 > URL: https://issues.apache.org/jira/browse/SPARK-10521 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Luciano Resende >Assignee: Luciano Resende > Fix For: 2.0.0 > > > There was a discussion in SPARK-10170 around using a docker image to execute > the DB2 JDBC dialect tests. I will use this jira to work on providing the > basic image together with the test integration. We can then extend the > testing coverage as needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support
[ https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-10521. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 9893 [https://github.com/apache/spark/pull/9893] > Utilize Docker to test DB2 JDBC Dialect support > --- > > Key: SPARK-10521 > URL: https://issues.apache.org/jira/browse/SPARK-10521 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Luciano Resende > Fix For: 2.0.0 > > > There was a discussion in SPARK-10170 around using a docker image to execute > the DB2 JDBC dialect tests. I will use this jira to work on providing the > basic image together with the test integration. We can then extend the > testing coverage as needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14401) Switch to stock sbt-pom-reader plugin
[ https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14401: Assignee: Apache Spark > Switch to stock sbt-pom-reader plugin > - > > Key: SPARK-14401 > URL: https://issues.apache.org/jira/browse/SPARK-14401 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Josh Rosen >Assignee: Apache Spark > > Spark currently depends on a forked version of {{sbt-pom-reader}} which we > build from source. It would be great to port our modifications to the > upstream project so that we can migrate to the official version and stop > maintaining our fork. > [~scrapco...@gmail.com], could you edit this ticket to fill in more detail > about which custom changes have not been ported yet? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-1844) Support maven-style dependency resolution in sbt build
[ https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-1844: --- Assignee: Apache Spark (was: Josh Rosen) > Support maven-style dependency resolution in sbt build > -- > > Key: SPARK-1844 > URL: https://issues.apache.org/jira/browse/SPARK-1844 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Patrick Wendell >Assignee: Apache Spark > > [Currently this is a brainstorm/wish - not sure it's possible] > Ivy/sbt and maven use fundamentally different strategies when transitive > dependencies conflict (i.e. when we have two copies of library Y in our > dependency graph on different versions). > This actually means our sbt and maven builds have been divergent for a long > time. > Ivy/sbt have a pluggable notion of a [conflict > manager|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.ivy/ivy/2.3.0/org/apache/ivy/plugins/conflict/ConflictManager.java]. > The default chooses the newest version of the dependency. SBT [allows this > to be > changed|http://www.scala-sbt.org/release/sxr/sbt/IvyInterface.scala.html#sbt;ConflictManager] > though. > Maven employs the [nearest > wins|http://techidiocy.com/maven-dependency-version-conflict-problem-and-resolution/] > policy which means the version closes to the project root is chosen. > It would be nice to be able to have matching semantics in the builds. We > could do this by writing a conflict manger in sbt that mimics Maven's > behavior. The fact that IVY-813 has existed for 6 years without anyone doing > this makes me wonder if that is not possible or very hard :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14401) Switch to stock sbt-pom-reader plugin
[ https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236204#comment-15236204 ] Apache Spark commented on SPARK-14401: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/12310 > Switch to stock sbt-pom-reader plugin > - > > Key: SPARK-14401 > URL: https://issues.apache.org/jira/browse/SPARK-14401 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Josh Rosen > > Spark currently depends on a forked version of {{sbt-pom-reader}} which we > build from source. It would be great to port our modifications to the > upstream project so that we can migrate to the official version and stop > maintaining our fork. > [~scrapco...@gmail.com], could you edit this ticket to fill in more detail > about which custom changes have not been ported yet? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-1844) Support maven-style dependency resolution in sbt build
[ https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-1844: --- Assignee: Josh Rosen (was: Apache Spark) > Support maven-style dependency resolution in sbt build > -- > > Key: SPARK-1844 > URL: https://issues.apache.org/jira/browse/SPARK-1844 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Patrick Wendell >Assignee: Josh Rosen > > [Currently this is a brainstorm/wish - not sure it's possible] > Ivy/sbt and maven use fundamentally different strategies when transitive > dependencies conflict (i.e. when we have two copies of library Y in our > dependency graph on different versions). > This actually means our sbt and maven builds have been divergent for a long > time. > Ivy/sbt have a pluggable notion of a [conflict > manager|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.ivy/ivy/2.3.0/org/apache/ivy/plugins/conflict/ConflictManager.java]. > The default chooses the newest version of the dependency. SBT [allows this > to be > changed|http://www.scala-sbt.org/release/sxr/sbt/IvyInterface.scala.html#sbt;ConflictManager] > though. > Maven employs the [nearest > wins|http://techidiocy.com/maven-dependency-version-conflict-problem-and-resolution/] > policy which means the version closes to the project root is chosen. > It would be nice to be able to have matching semantics in the builds. We > could do this by writing a conflict manger in sbt that mimics Maven's > behavior. The fact that IVY-813 has existed for 6 years without anyone doing > this makes me wonder if that is not possible or very hard :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14401) Switch to stock sbt-pom-reader plugin
[ https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14401: Assignee: (was: Apache Spark) > Switch to stock sbt-pom-reader plugin > - > > Key: SPARK-14401 > URL: https://issues.apache.org/jira/browse/SPARK-14401 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Josh Rosen > > Spark currently depends on a forked version of {{sbt-pom-reader}} which we > build from source. It would be great to port our modifications to the > upstream project so that we can migrate to the official version and stop > maintaining our fork. > [~scrapco...@gmail.com], could you edit this ticket to fill in more detail > about which custom changes have not been ported yet? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1844) Support maven-style dependency resolution in sbt build
[ https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236205#comment-15236205 ] Apache Spark commented on SPARK-1844: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/12310 > Support maven-style dependency resolution in sbt build > -- > > Key: SPARK-1844 > URL: https://issues.apache.org/jira/browse/SPARK-1844 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Patrick Wendell >Assignee: Josh Rosen > > [Currently this is a brainstorm/wish - not sure it's possible] > Ivy/sbt and maven use fundamentally different strategies when transitive > dependencies conflict (i.e. when we have two copies of library Y in our > dependency graph on different versions). > This actually means our sbt and maven builds have been divergent for a long > time. > Ivy/sbt have a pluggable notion of a [conflict > manager|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.ivy/ivy/2.3.0/org/apache/ivy/plugins/conflict/ConflictManager.java]. > The default chooses the newest version of the dependency. SBT [allows this > to be > changed|http://www.scala-sbt.org/release/sxr/sbt/IvyInterface.scala.html#sbt;ConflictManager] > though. > Maven employs the [nearest > wins|http://techidiocy.com/maven-dependency-version-conflict-problem-and-resolution/] > policy which means the version closes to the project root is chosen. > It would be nice to be able to have matching semantics in the builds. We > could do this by writing a conflict manger in sbt that mimics Maven's > behavior. The fact that IVY-813 has existed for 6 years without anyone doing > this makes me wonder if that is not possible or very hard :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-1844) Support maven-style dependency resolution in sbt build
[ https://issues.apache.org/jira/browse/SPARK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reopened SPARK-1844: --- Assignee: Josh Rosen (was: Prashant Sharma) > Support maven-style dependency resolution in sbt build > -- > > Key: SPARK-1844 > URL: https://issues.apache.org/jira/browse/SPARK-1844 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Patrick Wendell >Assignee: Josh Rosen > > [Currently this is a brainstorm/wish - not sure it's possible] > Ivy/sbt and maven use fundamentally different strategies when transitive > dependencies conflict (i.e. when we have two copies of library Y in our > dependency graph on different versions). > This actually means our sbt and maven builds have been divergent for a long > time. > Ivy/sbt have a pluggable notion of a [conflict > manager|http://grepcode.com/file/repo1.maven.org/maven2/org.apache.ivy/ivy/2.3.0/org/apache/ivy/plugins/conflict/ConflictManager.java]. > The default chooses the newest version of the dependency. SBT [allows this > to be > changed|http://www.scala-sbt.org/release/sxr/sbt/IvyInterface.scala.html#sbt;ConflictManager] > though. > Maven employs the [nearest > wins|http://techidiocy.com/maven-dependency-version-conflict-problem-and-resolution/] > policy which means the version closes to the project root is chosen. > It would be nice to be able to have matching semantics in the builds. We > could do this by writing a conflict manger in sbt that mimics Maven's > behavior. The fact that IVY-813 has existed for 6 years without anyone doing > this makes me wonder if that is not possible or very hard :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14401) Switch to stock sbt-pom-reader plugin
[ https://issues.apache.org/jira/browse/SPARK-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14401: --- Summary: Switch to stock sbt-pom-reader plugin (was: Merge our sbt-pom-reader changes upstream) > Switch to stock sbt-pom-reader plugin > - > > Key: SPARK-14401 > URL: https://issues.apache.org/jira/browse/SPARK-14401 > Project: Spark > Issue Type: Improvement > Components: Build, Project Infra >Reporter: Josh Rosen > > Spark currently depends on a forked version of {{sbt-pom-reader}} which we > build from source. It would be great to port our modifications to the > upstream project so that we can migrate to the official version and stop > maintaining our fork. > [~scrapco...@gmail.com], could you edit this ticket to fill in more detail > about which custom changes have not been ported yet? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14544) Spark UI is very slow in recent Chrome
Davies Liu created SPARK-14544: -- Summary: Spark UI is very slow in recent Chrome Key: SPARK-14544 URL: https://issues.apache.org/jira/browse/SPARK-14544 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu Assignee: Davies Liu Once I ran an complicated query or there were many query in the SQL tab, the page is really really slow in Chrome 49, but fast in Safari/Firefox. Given that the fact that many users are using Chrome, so we should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14298) LDA should support disable checkpoint
[ https://issues.apache.org/jira/browse/SPARK-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-14298. --- Resolution: Fixed Fix Version/s: 1.6.2 1.5.3 > LDA should support disable checkpoint > - > > Key: SPARK-14298 > URL: https://issues.apache.org/jira/browse/SPARK-14298 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Yanbo Liang >Assignee: Yanbo Liang >Priority: Minor > Fix For: 1.5.3, 1.6.2, 2.0.0 > > > LDA should support disable checkpoint by setting checkpointInterval = -1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14542) PipeRDD should allow configurable buffer size for the stdin writer
[ https://issues.apache.org/jira/browse/SPARK-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14542: Assignee: (was: Apache Spark) > PipeRDD should allow configurable buffer size for the stdin writer > --- > > Key: SPARK-14542 > URL: https://issues.apache.org/jira/browse/SPARK-14542 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.6.1 >Reporter: Sital Kedia >Priority: Minor > > Currently PipedRDD internally uses PrintWriter to write data to the stdin of > the piped process, which by default uses a BufferedWriter of buffer size 8k. > In our experiment, we have seen that 8k buffer size is too small and the job > spends significant amount of CPU time in system calls to copy the data. We > should have a way to configure the buffer size for the writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14542) PipeRDD should allow configurable buffer size for the stdin writer
[ https://issues.apache.org/jira/browse/SPARK-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14542: Assignee: Apache Spark > PipeRDD should allow configurable buffer size for the stdin writer > --- > > Key: SPARK-14542 > URL: https://issues.apache.org/jira/browse/SPARK-14542 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.6.1 >Reporter: Sital Kedia >Assignee: Apache Spark >Priority: Minor > > Currently PipedRDD internally uses PrintWriter to write data to the stdin of > the piped process, which by default uses a BufferedWriter of buffer size 8k. > In our experiment, we have seen that 8k buffer size is too small and the job > spends significant amount of CPU time in system calls to copy the data. We > should have a way to configure the buffer size for the writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14542) PipeRDD should allow configurable buffer size for the stdin writer
[ https://issues.apache.org/jira/browse/SPARK-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236156#comment-15236156 ] Apache Spark commented on SPARK-14542: -- User 'sitalkedia' has created a pull request for this issue: https://github.com/apache/spark/pull/12309 > PipeRDD should allow configurable buffer size for the stdin writer > --- > > Key: SPARK-14542 > URL: https://issues.apache.org/jira/browse/SPARK-14542 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.6.1 >Reporter: Sital Kedia >Priority: Minor > > Currently PipedRDD internally uses PrintWriter to write data to the stdin of > the piped process, which by default uses a BufferedWriter of buffer size 8k. > In our experiment, we have seen that 8k buffer size is too small and the job > spends significant amount of CPU time in system calls to copy the data. We > should have a way to configure the buffer size for the writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14543) SQL/Hive insertInto has unexpected results
Ryan Blue created SPARK-14543: - Summary: SQL/Hive insertInto has unexpected results Key: SPARK-14543 URL: https://issues.apache.org/jira/browse/SPARK-14543 Project: Spark Issue Type: Bug Components: SQL Reporter: Ryan Blue The Hive write path adds a pre-insertion cast (projection) to reconcile incoming data columns with the outgoing table schema. Columns are matched by position and casts are inserted to reconcile the two column schemas. When columns aren't correctly aligned, this causes unexpected results. I ran into this by not using a correct {{partitionBy}} call (addressed by SPARK-14459), which caused an error message that an int could not be cast to an array. However, if the columns are vaguely compatible, for example string and float, then no error or warning is produced and data is written to the wrong columns using unexpected casts (string -> bigint -> float). A real-world use case that will hit this is when a table definition changes by adding a column in the middle of a table. Spark SQL statements that copied from that table to a destination table will then map the columns differently but insert casts that mask the problem. The last column's data will be dropped without a reliable warning for the user. This highlights a few problems: * Too many or too few incoming data columns should cause an AnalysisException to be thrown * Only "safe" casts should be inserted automatically, like int -> long, using UpCast * Pre-insertion casts currently ignore extra columns by using zip * The pre-insertion cast logic differs between Hive's MetastoreRelation and LogicalRelation Also, I think there should be an option to match input data to output columns by name. The API allows operations on tables, which hide the column resolution problem. It's easy to copy from one table to another without listing the columns, and in the API it is common to work with columns by name rather than by position. I think the API should add a way to match columns by name, which is closer to what users expect. I propose adding something like this: {code} CREATE TABLE src (id: bigint, count: int, total: bigint) CREATE TABLE dst (id: bigint, total: bigint, count: int) sqlContext.table("src").write.byName.insertInto("dst") {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS Query27
[ https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14521: --- Priority: Blocker (was: Critical) > StackOverflowError in Kryo when executing TPC-DS Query27 > > > Key: SPARK-14521 > URL: https://issues.apache.org/jira/browse/SPARK-14521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Rajesh Balamohan >Priority: Blocker > > Build details: Spark build from master branch (Apr-10) > DataSet:TPC-DS at 200 GB scale in Parq format stored in hive. > Client: $SPARK_HOME/bin/beeline > Query: TPC-DS Query27 > spark.sql.sources.fileScan=true (this is the default value anyways) > Exception: > {noformat} > Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS Query27
[ https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14521: --- Affects Version/s: 2.0.0 > StackOverflowError in Kryo when executing TPC-DS Query27 > > > Key: SPARK-14521 > URL: https://issues.apache.org/jira/browse/SPARK-14521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Rajesh Balamohan > > Build details: Spark build from master branch (Apr-10) > DataSet:TPC-DS at 200 GB scale in Parq format stored in hive. > Client: $SPARK_HOME/bin/beeline > Query: TPC-DS Query27 > spark.sql.sources.fileScan=true (this is the default value anyways) > Exception: > {noformat} > Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS Query27
[ https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236094#comment-15236094 ] Josh Rosen commented on SPARK-14521: Downgrading to Kryo 2 is not an option so we'll have to fix this. > StackOverflowError in Kryo when executing TPC-DS Query27 > > > Key: SPARK-14521 > URL: https://issues.apache.org/jira/browse/SPARK-14521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Rajesh Balamohan > > Build details: Spark build from master branch (Apr-10) > DataSet:TPC-DS at 200 GB scale in Parq format stored in hive. > Client: $SPARK_HOME/bin/beeline > Query: TPC-DS Query27 > spark.sql.sources.fileScan=true (this is the default value anyways) > Exception: > {noformat} > Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS Query27
[ https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14521: --- Priority: Critical (was: Major) > StackOverflowError in Kryo when executing TPC-DS Query27 > > > Key: SPARK-14521 > URL: https://issues.apache.org/jira/browse/SPARK-14521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Rajesh Balamohan >Priority: Critical > > Build details: Spark build from master branch (Apr-10) > DataSet:TPC-DS at 200 GB scale in Parq format stored in hive. > Client: $SPARK_HOME/bin/beeline > Query: TPC-DS Query27 > spark.sql.sources.fileScan=true (this is the default value anyways) > Exception: > {noformat} > Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org