[jira] [Comment Edited] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784796#comment-15784796 ] Jork Zijlstra edited comment on SPARK-19012 at 12/29/16 7:56 AM: - Good to see that its already being discussed. MSSQL also has some limitation in tableOrViewNames which is described in the documentation. Maybe updating the annotation of the method would also be enough. Having an Exception with a clear reason would definitely already a fix. [~hvanhovell] We specify our queries inside a configuration not the code. So we have this in our config: dataPath = "hdfs://" dataQuery: "SELECT column1, column2 FROM \[TABLE] WHERE 1 = 1" Since we have one SparkSession for the application and the tableOrViewName is coupled to that and we don't want to specify an extra config option for the tableOrViewname, I though I'd just use the hashcode from the dataquery as the tableOrViewName. Use that in the createOrReplaceTempView and replace \[TABLE] inside the query with that. {code} val path = "hdfs://{path}" val dataQuery = "SELECT * FROM [TABLE] LIMIT 1" val tableOrViewName = "_" + Math.abs(path.hashCode).toString + Math.abs(qry.hashCode).toString val df = sparkSession.read.orc(path) df.createOrReplaceTempView(tableOrViewName) val result = sparkSession.sqlContext.sql(qry.replace("[TABLE]", tableOrViewName)).collect {code} Later I want to check If the tableOrViewName has already been created and not call createOrReplaceTempView everytime, but this is just performance improvement. was (Author: jzijlstra): Good to see that its already being discussed. MSSQL also has some limitation in tableOrViewNames which is described in the documentation. Maybe updating the annotation of the method would also be enough. Having an Exception with a clear reason would definitely already a fix. [~hvanhovell] We specify our queries inside a configuration not the code. So we have this in our config: dataPath = "hdfs://" dataQuery: "SELECT column1, column2 FROM \[TABLE] WHERE 1 = 1" Since we have one SparkSession for the application and the tableOrViewName is coupled to that and we don't want to specify an extra config option for the tableOrViewname, I though I'd just use the hashcode from the dataquery as the tableOrViewName. Use that in the createOrReplaceTempView and replace \[TABLE] inside the query with that. {code} val path = "hdfs://{path}" val dataQuery = "SELECT * FROM [TABLE] LIMIT 1" val tableOrViewName = "_" + Math.abs(path.hashCode).toString + Math.abs(qry.hashCode).toString val df = sparkSession.read.orc(path) df.createOrReplaceTempView(tableOrViewName) val result = sparkSession.sqlContext.sql(qry.replace("[TABLE]", tableOrViewName)).collect {code} > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE'
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784796#comment-15784796 ] Jork Zijlstra commented on SPARK-19012: --- Good to see that its already being discussed. MSSQL also has some limitation in tableOrViewNames which is described in the documentation. Maybe updating the annotation of the method would also be enough. Having an Exception with a clear reason would definitely already a fix. [~hvanhovell] We specify our queries inside a configuration not the code. So we have this in our config: dataPath = "hdfs://" dataQuery: "SELECT column1, column2 FROM \[TABLE] WHERE 1 = 1" Since we have one SparkSession for the application and the tableOrViewName is coupled to that and we don't want to specify an extra config option for the tableOrViewname, I though I'd just use the hashcode from the dataquery as the tableOrViewName. Use that in the createOrReplaceTempView and replace \[TABLE] inside the query with that. {code} val path = "hdfs://{path}" val dataQuery = "SELECT * FROM [TABLE] LIMIT 1" val tableOrViewName = "_" + Math.abs(path.hashCode).toString + Math.abs(qry.hashCode).toString val df = sparkSession.read.orc(path) df.createOrReplaceTempView(tableOrViewName) val result = sparkSession.sqlContext.sql(qry.replace("[TABLE]", tableOrViewName)).collect {code} > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS
[ https://issues.apache.org/jira/browse/SPARK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19021: Assignee: (was: Apache Spark) > Generailize HDFSCredentialProvider to support non HDFS security FS > -- > > Key: SPARK-19021 > URL: https://issues.apache.org/jira/browse/SPARK-19021 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.1.0 >Reporter: Saisai Shao >Priority: Minor > > Currently Spark can only get token renewal interval from security HDFS > (hdfs://), if Spark runs with other security file systems like webHDFS > (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get > token renewal intervals from these tokens. These will make Spark unable to > work with these security clusters. So instead of only checking HDFS token, we > should generalize to support different {{DelegationTokenIdentifier}}. > This is a follow-up work of SPARK-18840. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS
[ https://issues.apache.org/jira/browse/SPARK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19021: Assignee: Apache Spark > Generailize HDFSCredentialProvider to support non HDFS security FS > -- > > Key: SPARK-19021 > URL: https://issues.apache.org/jira/browse/SPARK-19021 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.1.0 >Reporter: Saisai Shao >Assignee: Apache Spark >Priority: Minor > > Currently Spark can only get token renewal interval from security HDFS > (hdfs://), if Spark runs with other security file systems like webHDFS > (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get > token renewal intervals from these tokens. These will make Spark unable to > work with these security clusters. So instead of only checking HDFS token, we > should generalize to support different {{DelegationTokenIdentifier}}. > This is a follow-up work of SPARK-18840. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS
[ https://issues.apache.org/jira/browse/SPARK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784772#comment-15784772 ] Apache Spark commented on SPARK-19021: -- User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/16432 > Generailize HDFSCredentialProvider to support non HDFS security FS > -- > > Key: SPARK-19021 > URL: https://issues.apache.org/jira/browse/SPARK-19021 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.1.0 >Reporter: Saisai Shao >Priority: Minor > > Currently Spark can only get token renewal interval from security HDFS > (hdfs://), if Spark runs with other security file systems like webHDFS > (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get > token renewal intervals from these tokens. These will make Spark unable to > work with these security clusters. So instead of only checking HDFS token, we > should generalize to support different {{DelegationTokenIdentifier}}. > This is a follow-up work of SPARK-18840. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS
Saisai Shao created SPARK-19021: --- Summary: Generailize HDFSCredentialProvider to support non HDFS security FS Key: SPARK-19021 URL: https://issues.apache.org/jira/browse/SPARK-19021 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.1.0 Reporter: Saisai Shao Currently Spark can only get token renewal interval from security HDFS (hdfs://), if Spark runs with other security file systems like webHDFS (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get token renewal intervals from these tokens. These will make Spark unable to work with these security clusters. So instead of only checking HDFS token, we should generalize to support different {{DelegationTokenIdentifier}}. This is a follow-up work of SPARK-18840. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS
[ https://issues.apache.org/jira/browse/SPARK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-19021: Priority: Minor (was: Major) > Generailize HDFSCredentialProvider to support non HDFS security FS > -- > > Key: SPARK-19021 > URL: https://issues.apache.org/jira/browse/SPARK-19021 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.1.0 >Reporter: Saisai Shao >Priority: Minor > > Currently Spark can only get token renewal interval from security HDFS > (hdfs://), if Spark runs with other security file systems like webHDFS > (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get > token renewal intervals from these tokens. These will make Spark unable to > work with these security clusters. So instead of only checking HDFS token, we > should generalize to support different {{DelegationTokenIdentifier}}. > This is a follow-up work of SPARK-18840. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18930) Inserting in partitioned table - partitioned field should be last in select statement.
[ https://issues.apache.org/jira/browse/SPARK-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784678#comment-15784678 ] Song Jun edited comment on SPARK-18930 at 12/29/16 6:44 AM: from hive document, https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert Note that the dynamic partition values are selected by ordering, not name, and taken as the last columns from the select clause. and test it on hive also have the same logic as your description . I think we can close this jira? [~srowen] was (Author: windpiger): from hive document, https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert Note that the dynamic partition values are selected by ordering, not name, and taken as the last columns from the select clause. and test it on hive also have the same logic as your description . I think we can close this jira? > Inserting in partitioned table - partitioned field should be last in select > statement. > --- > > Key: SPARK-18930 > URL: https://issues.apache.org/jira/browse/SPARK-18930 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Egor Pahomov > > CREATE TABLE temp.test_partitioning_4 ( > num string > ) > PARTITIONED BY ( > day string) > stored as parquet > INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day) > select day, count(*) as num from > hss.session where year=2016 and month=4 > group by day > Resulted schema on HDFS: /temp.db/test_partitioning_3/day=62456298, > emp.db/test_partitioning_3/day=69094345 > As you can imagine these numbers are num of records. But! When I do select * > from temp.test_partitioning_4 data is correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18930) Inserting in partitioned table - partitioned field should be last in select statement.
[ https://issues.apache.org/jira/browse/SPARK-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784678#comment-15784678 ] Song Jun commented on SPARK-18930: -- from hive document, https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert Note that the dynamic partition values are selected by ordering, not name, and taken as the last columns from the select clause. and test it on hive also have the same logic as your description . I think we can close this jira? > Inserting in partitioned table - partitioned field should be last in select > statement. > --- > > Key: SPARK-18930 > URL: https://issues.apache.org/jira/browse/SPARK-18930 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Egor Pahomov > > CREATE TABLE temp.test_partitioning_4 ( > num string > ) > PARTITIONED BY ( > day string) > stored as parquet > INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day) > select day, count(*) as num from > hss.session where year=2016 and month=4 > group by day > Resulted schema on HDFS: /temp.db/test_partitioning_3/day=62456298, > emp.db/test_partitioning_3/day=69094345 > As you can imagine these numbers are num of records. But! When I do select * > from temp.test_partitioning_4 data is correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19020) Cardinality estimation of aggregate operator
[ https://issues.apache.org/jira/browse/SPARK-19020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19020: Assignee: Apache Spark > Cardinality estimation of aggregate operator > > > Key: SPARK-19020 > URL: https://issues.apache.org/jira/browse/SPARK-19020 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Zhenhua Wang >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19020) Cardinality estimation of aggregate operator
[ https://issues.apache.org/jira/browse/SPARK-19020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19020: Assignee: (was: Apache Spark) > Cardinality estimation of aggregate operator > > > Key: SPARK-19020 > URL: https://issues.apache.org/jira/browse/SPARK-19020 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Zhenhua Wang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19020) Cardinality estimation of aggregate operator
[ https://issues.apache.org/jira/browse/SPARK-19020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784674#comment-15784674 ] Apache Spark commented on SPARK-19020: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/16431 > Cardinality estimation of aggregate operator > > > Key: SPARK-19020 > URL: https://issues.apache.org/jira/browse/SPARK-19020 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Zhenhua Wang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18567) Simplify CreateDataSourceTableAsSelectCommand
[ https://issues.apache.org/jira/browse/SPARK-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-18567. -- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 15996 [https://github.com/apache/spark/pull/15996] > Simplify CreateDataSourceTableAsSelectCommand > - > > Key: SPARK-18567 > URL: https://issues.apache.org/jira/browse/SPARK-18567 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results
[ https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-16849: Comment: was deleted (was: Design doc v1) > Improve subquery execution by deduplicating the subqueries with the same > results > > > Key: SPARK-16849 > URL: https://issues.apache.org/jira/browse/SPARK-16849 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Liang-Chi Hsieh > Attachments: de-duplicating subqueries.pdf > > > The subqueries in SparkSQL will be run even they have the same physical plan > and output same results. We should be able to deduplicate these subqueries > which are referred in a query for many times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results
[ https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-16849: Attachment: de-duplicating subqueries.pdf > Improve subquery execution by deduplicating the subqueries with the same > results > > > Key: SPARK-16849 > URL: https://issues.apache.org/jira/browse/SPARK-16849 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Liang-Chi Hsieh > Attachments: de-duplicating subqueries.pdf > > > The subqueries in SparkSQL will be run even they have the same physical plan > and output same results. We should be able to deduplicate these subqueries > which are referred in a query for many times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results
[ https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-16849: Attachment: (was: de-duplicating subqueries.pdf) > Improve subquery execution by deduplicating the subqueries with the same > results > > > Key: SPARK-16849 > URL: https://issues.apache.org/jira/browse/SPARK-16849 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Liang-Chi Hsieh > Attachments: de-duplicating subqueries.pdf > > > The subqueries in SparkSQL will be run even they have the same physical plan > and output same results. We should be able to deduplicate these subqueries > which are referred in a query for many times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19020) Cardinality estimation of aggregate operator
Zhenhua Wang created SPARK-19020: Summary: Cardinality estimation of aggregate operator Key: SPARK-19020 URL: https://issues.apache.org/jira/browse/SPARK-19020 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Zhenhua Wang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17077) Cardinality estimation of project operator
[ https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17077: - Summary: Cardinality estimation of project operator (was: Cardinality estimation for project operator) > Cardinality estimation of project operator > -- > > Key: SPARK-17077 > URL: https://issues.apache.org/jira/browse/SPARK-17077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17077) Cardinality estimation for project operator
[ https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17077: - Summary: Cardinality estimation for project operator (was: Cardinality estimation project operator) > Cardinality estimation for project operator > --- > > Key: SPARK-17077 > URL: https://issues.apache.org/jira/browse/SPARK-17077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17077) Cardinality estimation project operator
[ https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17077: Assignee: Apache Spark > Cardinality estimation project operator > --- > > Key: SPARK-17077 > URL: https://issues.apache.org/jira/browse/SPARK-17077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17077) Cardinality estimation project operator
[ https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17077: Assignee: (was: Apache Spark) > Cardinality estimation project operator > --- > > Key: SPARK-17077 > URL: https://issues.apache.org/jira/browse/SPARK-17077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17077) Cardinality estimation project operator
[ https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784384#comment-15784384 ] Apache Spark commented on SPARK-17077: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/16430 > Cardinality estimation project operator > --- > > Key: SPARK-17077 > URL: https://issues.apache.org/jira/browse/SPARK-17077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17077) Cardinality estimation project operator
[ https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17077: - Summary: Cardinality estimation project operator (was: Cardinality estimation of group-by, project, union, etc.) > Cardinality estimation project operator > --- > > Key: SPARK-17077 > URL: https://issues.apache.org/jira/browse/SPARK-17077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16213) Reduce runtime overhead of a program that creates an primitive array in DataFrame
[ https://issues.apache.org/jira/browse/SPARK-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-16213. - Resolution: Fixed Assignee: Kazuaki Ishizaki Fix Version/s: 2.2.0 > Reduce runtime overhead of a program that creates an primitive array in > DataFrame > - > > Key: SPARK-16213 > URL: https://issues.apache.org/jira/browse/SPARK-16213 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki > Fix For: 2.2.0 > > > Reduce runtime overhead of a program that creates an primitive array in > DataFrame > When a program creates an array in DataFrame, the code generator creates > boxing operations. If an array is for primitive type, there are some > opportunities for optimizations in generated code to reduce runtime overhead. > Here is a simple example that has generated code with boxing operation > {code} > val df = sparkContext.parallelize(Seq(0.0d, 1.0d), 1).toDF > df.selectExpr("Array(value + 1.1d, value + 2.2d)").show > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19019) PySpark does not work with Python 3.6.0
[ https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19019: Assignee: Apache Spark > PySpark does not work with Python 3.6.0 > --- > > Key: SPARK-19019 > URL: https://issues.apache.org/jira/browse/SPARK-19019 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Critical > > Currently, PySpark does not work with Python 3.6.0. > Running {{./bin/pyspark}} simply throws the error as below: > {code} > Traceback (most recent call last): > File ".../spark/python/pyspark/shell.py", line 30, in > import pyspark > File ".../spark/python/pyspark/__init__.py", line 46, in > from pyspark.context import SparkContext > File ".../spark/python/pyspark/context.py", line 36, in > from pyspark.java_gateway import launch_gateway > File ".../spark/python/pyspark/java_gateway.py", line 31, in > from py4j.java_gateway import java_import, JavaGateway, GatewayClient > File "", line 961, in _find_and_load > File "", line 950, in _find_and_load_unlocked > File "", line 646, in _load_unlocked > File "", line 616, in _load_backward_compatible > File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line > 18, in > File > "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", > line 62, in > import pkgutil > File > "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", > line 22, in > ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg') > File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple > cls = _old_namedtuple(*args, **kwargs) > TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', > 'rename', and 'module' > {code} > The problem is in > https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394 > as the error says and the cause seems because the arguments of > {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 > (See https://bugs.python.org/issue25628). > We currently copy this function via {{types.FunctionType}} which does not set > the default values of keyword-only arguments (meaning > {{namedtuple.__kwdefaults__}}) and this seems causing internally missing > values in the function (non-bound arguments). > This ends up as below: > {code} > import types > import collections > def _copy_func(f): > return types.FunctionType(f.__code__, f.__globals__, f.__name__, > f.__defaults__, f.__closure__) > _old_namedtuple = _copy_func(collections.namedtuple) > _old_namedtuple(, "b") > _old_namedtuple("a") > {code} > If we call as below: > {code} > >>> _old_namedtuple("a", "b") > Traceback (most recent call last): > File "", line 1, in > TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', > 'rename', and 'module' > {code} > It throws an exception as above becuase {{__kwdefaults__}} for required > keyword arguments seem unset in the copied function. So, if we give explicit > value for these, > {code} > >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None) > > {code} > It works fine. > It seems now we should properly set these into the hijected one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19019) PySpark does not work with Python 3.6.0
[ https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19019: Assignee: (was: Apache Spark) > PySpark does not work with Python 3.6.0 > --- > > Key: SPARK-19019 > URL: https://issues.apache.org/jira/browse/SPARK-19019 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Hyukjin Kwon >Priority: Critical > > Currently, PySpark does not work with Python 3.6.0. > Running {{./bin/pyspark}} simply throws the error as below: > {code} > Traceback (most recent call last): > File ".../spark/python/pyspark/shell.py", line 30, in > import pyspark > File ".../spark/python/pyspark/__init__.py", line 46, in > from pyspark.context import SparkContext > File ".../spark/python/pyspark/context.py", line 36, in > from pyspark.java_gateway import launch_gateway > File ".../spark/python/pyspark/java_gateway.py", line 31, in > from py4j.java_gateway import java_import, JavaGateway, GatewayClient > File "", line 961, in _find_and_load > File "", line 950, in _find_and_load_unlocked > File "", line 646, in _load_unlocked > File "", line 616, in _load_backward_compatible > File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line > 18, in > File > "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", > line 62, in > import pkgutil > File > "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", > line 22, in > ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg') > File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple > cls = _old_namedtuple(*args, **kwargs) > TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', > 'rename', and 'module' > {code} > The problem is in > https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394 > as the error says and the cause seems because the arguments of > {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 > (See https://bugs.python.org/issue25628). > We currently copy this function via {{types.FunctionType}} which does not set > the default values of keyword-only arguments (meaning > {{namedtuple.__kwdefaults__}}) and this seems causing internally missing > values in the function (non-bound arguments). > This ends up as below: > {code} > import types > import collections > def _copy_func(f): > return types.FunctionType(f.__code__, f.__globals__, f.__name__, > f.__defaults__, f.__closure__) > _old_namedtuple = _copy_func(collections.namedtuple) > _old_namedtuple(, "b") > _old_namedtuple("a") > {code} > If we call as below: > {code} > >>> _old_namedtuple("a", "b") > Traceback (most recent call last): > File "", line 1, in > TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', > 'rename', and 'module' > {code} > It throws an exception as above becuase {{__kwdefaults__}} for required > keyword arguments seem unset in the copied function. So, if we give explicit > value for these, > {code} > >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None) > > {code} > It works fine. > It seems now we should properly set these into the hijected one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19019) PySpark does not work with Python 3.6.0
[ https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784329#comment-15784329 ] Apache Spark commented on SPARK-19019: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/16429 > PySpark does not work with Python 3.6.0 > --- > > Key: SPARK-19019 > URL: https://issues.apache.org/jira/browse/SPARK-19019 > Project: Spark > Issue Type: Bug > Components: PySpark >Reporter: Hyukjin Kwon >Priority: Critical > > Currently, PySpark does not work with Python 3.6.0. > Running {{./bin/pyspark}} simply throws the error as below: > {code} > Traceback (most recent call last): > File ".../spark/python/pyspark/shell.py", line 30, in > import pyspark > File ".../spark/python/pyspark/__init__.py", line 46, in > from pyspark.context import SparkContext > File ".../spark/python/pyspark/context.py", line 36, in > from pyspark.java_gateway import launch_gateway > File ".../spark/python/pyspark/java_gateway.py", line 31, in > from py4j.java_gateway import java_import, JavaGateway, GatewayClient > File "", line 961, in _find_and_load > File "", line 950, in _find_and_load_unlocked > File "", line 646, in _load_unlocked > File "", line 616, in _load_backward_compatible > File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line > 18, in > File > "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", > line 62, in > import pkgutil > File > "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", > line 22, in > ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg') > File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple > cls = _old_namedtuple(*args, **kwargs) > TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', > 'rename', and 'module' > {code} > The problem is in > https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394 > as the error says and the cause seems because the arguments of > {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 > (See https://bugs.python.org/issue25628). > We currently copy this function via {{types.FunctionType}} which does not set > the default values of keyword-only arguments (meaning > {{namedtuple.__kwdefaults__}}) and this seems causing internally missing > values in the function (non-bound arguments). > This ends up as below: > {code} > import types > import collections > def _copy_func(f): > return types.FunctionType(f.__code__, f.__globals__, f.__name__, > f.__defaults__, f.__closure__) > _old_namedtuple = _copy_func(collections.namedtuple) > _old_namedtuple(, "b") > _old_namedtuple("a") > {code} > If we call as below: > {code} > >>> _old_namedtuple("a", "b") > Traceback (most recent call last): > File "", line 1, in > TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', > 'rename', and 'module' > {code} > It throws an exception as above becuase {{__kwdefaults__}} for required > keyword arguments seem unset in the copied function. So, if we give explicit > value for these, > {code} > >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None) > > {code} > It works fine. > It seems now we should properly set these into the hijected one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19019) PySpark does not work with Python 3.6.0
Hyukjin Kwon created SPARK-19019: Summary: PySpark does not work with Python 3.6.0 Key: SPARK-19019 URL: https://issues.apache.org/jira/browse/SPARK-19019 Project: Spark Issue Type: Bug Components: PySpark Reporter: Hyukjin Kwon Priority: Critical Currently, PySpark does not work with Python 3.6.0. Running {{./bin/pyspark}} simply throws the error as below: {code} Traceback (most recent call last): File ".../spark/python/pyspark/shell.py", line 30, in import pyspark File ".../spark/python/pyspark/__init__.py", line 46, in from pyspark.context import SparkContext File ".../spark/python/pyspark/context.py", line 36, in from pyspark.java_gateway import launch_gateway File ".../spark/python/pyspark/java_gateway.py", line 31, in from py4j.java_gateway import java_import, JavaGateway, GatewayClient File "", line 961, in _find_and_load File "", line 950, in _find_and_load_unlocked File "", line 646, in _load_unlocked File "", line 616, in _load_backward_compatible File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in import pkgutil File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg') File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple cls = _old_namedtuple(*args, **kwargs) TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module' {code} The problem is in https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394 as the error says and the cause seems because the arguments of {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628). We currently copy this function via {{types.FunctionType}} which does not set the default values of keyword-only arguments (meaning {{namedtuple.__kwdefaults__}}) and this seems causing internally missing values in the function (non-bound arguments). This ends up as below: {code} import types import collections def _copy_func(f): return types.FunctionType(f.__code__, f.__globals__, f.__name__, f.__defaults__, f.__closure__) _old_namedtuple = _copy_func(collections.namedtuple) _old_namedtuple(, "b") _old_namedtuple("a") {code} If we call as below: {code} >>> _old_namedtuple("a", "b") Traceback (most recent call last): File "", line 1, in TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module' {code} It throws an exception as above becuase {{__kwdefaults__}} for required keyword arguments seem unset in the copied function. So, if we give explicit value for these, {code} >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None) {code} It works fine. It seems now we should properly set these into the hijected one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19018) spark csv writer charset support
[ https://issues.apache.org/jira/browse/SPARK-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19018: Assignee: (was: Apache Spark) > spark csv writer charset support > > > Key: SPARK-19018 > URL: https://issues.apache.org/jira/browse/SPARK-19018 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: todd.chen > > if we write dataFrame to csv ,default charset is utf-8,and we can't change it > like we read csv by giving `encoding ` in params ,so I think we should > support csv write by pass a param -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19018) spark csv writer charset support
[ https://issues.apache.org/jira/browse/SPARK-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19018: Assignee: Apache Spark > spark csv writer charset support > > > Key: SPARK-19018 > URL: https://issues.apache.org/jira/browse/SPARK-19018 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: todd.chen >Assignee: Apache Spark > > if we write dataFrame to csv ,default charset is utf-8,and we can't change it > like we read csv by giving `encoding ` in params ,so I think we should > support csv write by pass a param -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19018) spark csv writer charset support
[ https://issues.apache.org/jira/browse/SPARK-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784278#comment-15784278 ] Apache Spark commented on SPARK-19018: -- User 'cjuexuan' has created a pull request for this issue: https://github.com/apache/spark/pull/16428 > spark csv writer charset support > > > Key: SPARK-19018 > URL: https://issues.apache.org/jira/browse/SPARK-19018 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: todd.chen > > if we write dataFrame to csv ,default charset is utf-8,and we can't change it > like we read csv by giving `encoding ` in params ,so I think we should > support csv write by pass a param -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19018) spark csv writer charset support
todd.chen created SPARK-19018: - Summary: spark csv writer charset support Key: SPARK-19018 URL: https://issues.apache.org/jira/browse/SPARK-19018 Project: Spark Issue Type: Bug Components: SQL Reporter: todd.chen if we write dataFrame to csv ,default charset is utf-8,and we can't change it like we read csv by giving `encoding ` in params ,so I think we should support csv write by pass a param -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19007) Speedup and optimize the GradientBoostedTrees in the "data>memory" scene
[ https://issues.apache.org/jira/browse/SPARK-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-19007: -- Component/s: (was: MLlib) > Speedup and optimize the GradientBoostedTrees in the "data>memory" scene > > > Key: SPARK-19007 > URL: https://issues.apache.org/jira/browse/SPARK-19007 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.0.1, 2.0.2, 2.1.0 > Environment: A CDH cluster consists of 3 redhat server ,(120G > memory、40 cores、43TB disk per server). >Reporter: zhangdenghui >Priority: Minor > Original Estimate: 168h > Remaining Estimate: 168h > > Test data:80G CTR training data from > criteolabs(http://criteolabs.wpengine.com/downloads/download-terabyte-click-logs/ > ) ,I used 1 of the 24 days' data.Some features needed to be repalced by new > generated continuous features,the way to generate the new features refers to > the way mentioned in the xgboost's paper. > Recource allocated: spark on yarn, 20 executors, 8G memory and 2 cores per > executor. > Parameters: numIterations 10, maxdepth 8, the rest parameters are default > I tested the GradientBoostedTrees algorithm in mllib using 80G CTR data > mentioned above. > It totally costs 1.5 hour, and i found many task failures after 6 or 7 GBT > rounds later.Without these task failures and task retry it can be much faster > ,which can save about half the time. I think it's caused by the RDD named > predError in the while loop of the boost method at > GradientBoostedTrees.scala,because the lineage of the RDD named predError is > growing after every GBT round, and then it caused failures like this : > (ExecutorLostFailure (executor 6 exited caused by one of the running tasks) > Reason: Container killed by YARN for exceeding memory limits. 10.2 GB of 10 > GB physical memory used. Consider boosting > spark.yarn.executor.memoryOverhead.). > I tried to boosting spark.yarn.executor.memoryOverhead but the meomry it > needed is too much (even increase half the memory can't solve the problem) > so i think it's not a proper method. > Although it can set the predCheckpoint Interval smaller to cut the line of > the lineage but it increases IO cost a lot. > I tried another way to solve this problem.I persisted the RDD named > predError every round and use pre_predError to record the previous RDD and > unpersist it because it's useless anymore. > Finally it costs about 45 min after i tried my method and no task failure > occured and no more memeory added. > So when the data is much larger than memory, my little improvement can > speedup the GradientBoostedTrees 1~2 times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18948) Add Mean Percentile Rank metric for ranking algorithms
[ https://issues.apache.org/jira/browse/SPARK-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784216#comment-15784216 ] Joseph K. Bradley commented on SPARK-18948: --- Thanks [~danilo.ascione] for suggesting this. A few initial comments: * We are not accepting new features in the RDD-based API (spark.mllib), but only in the DataFrame-based API (spark.ml). If you'd like to get this in, it will need to be rewritten for the DataFrame-based API. * (minor) Please don't set the Shepherd field; committers will use it to track releases. I haven't been able to check out the PR yet, but if broader discussions are being brought up there, then let's discuss the issues on this JIRA first before further implementation work. Thanks! > Add Mean Percentile Rank metric for ranking algorithms > -- > > Key: SPARK-18948 > URL: https://issues.apache.org/jira/browse/SPARK-18948 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Danilo Ascione > > Add the Mean Percentile Rank (MPR) metric for ranking algorithms, as > described in the paper : > Hu, Y., Y. Koren, and C. Volinsky. “Collaborative Filtering for Implicit > Feedback Datasets.” In 2008 Eighth IEEE International Conference on Data > Mining, 263–72, 2008. doi:10.1109/ICDM.2008.22. > (http://yifanhu.net/PUB/cf.pdf) (NB: MPR is called "Expected percentile rank" > in the paper) > The ALS algorithm for implicit feedback in Spark ML is based on the same > paper. > Spark ML lacks an implementation of an appropriate metric for implicit > feedback, so the MPR metric can fulfill this use case. > This implementation add the metric to the RankingMetrics class under > org.apache.spark.mllib.evaluation (SPARK-3568), and it uses the same input > (prediction and label pairs). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18948) Add Mean Percentile Rank metric for ranking algorithms
[ https://issues.apache.org/jira/browse/SPARK-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-18948: -- Shepherd: (was: Xiangrui Meng) > Add Mean Percentile Rank metric for ranking algorithms > -- > > Key: SPARK-18948 > URL: https://issues.apache.org/jira/browse/SPARK-18948 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Danilo Ascione > > Add the Mean Percentile Rank (MPR) metric for ranking algorithms, as > described in the paper : > Hu, Y., Y. Koren, and C. Volinsky. “Collaborative Filtering for Implicit > Feedback Datasets.” In 2008 Eighth IEEE International Conference on Data > Mining, 263–72, 2008. doi:10.1109/ICDM.2008.22. > (http://yifanhu.net/PUB/cf.pdf) (NB: MPR is called "Expected percentile rank" > in the paper) > The ALS algorithm for implicit feedback in Spark ML is based on the same > paper. > Spark ML lacks an implementation of an appropriate metric for implicit > feedback, so the MPR metric can fulfill this use case. > This implementation add the metric to the RankingMetrics class under > org.apache.spark.mllib.evaluation (SPARK-3568), and it uses the same input > (prediction and label pairs). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18929) Add Tweedie distribution in GLM
[ https://issues.apache.org/jira/browse/SPARK-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-18929: -- Affects Version/s: (was: 2.0.2) > Add Tweedie distribution in GLM > --- > > Key: SPARK-18929 > URL: https://issues.apache.org/jira/browse/SPARK-18929 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Wayne Zhang > Labels: features > Original Estimate: 72h > Remaining Estimate: 72h > > I propose to add the full Tweedie family into the GeneralizedLinearRegression > model. The Tweedie family is characterized by a power variance function. > Currently supported distributions such as Gaussian, Poisson and Gamma > families are a special case of the > [Tweedie|https://en.wikipedia.org/wiki/Tweedie_distribution]. > I propose to add support for the other distributions: > * compound Poisson: 1 < variancePower < 2. This one is widely used to model > zero-inflated continuous distributions. > * positive stable: variancePower > 2 and variancePower != 3. Used to model > extreme values. > * inverse Gaussian: variancePower = 3. > The Tweedie family is supported in most statistical packages such as R > (statmod), SAS, h2o etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18862) Split SparkR mllib.R into multiple files
[ https://issues.apache.org/jira/browse/SPARK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784190#comment-15784190 ] Joseph K. Bradley commented on SPARK-18862: --- I like the chosen organization too! > Split SparkR mllib.R into multiple files > > > Key: SPARK-18862 > URL: https://issues.apache.org/jira/browse/SPARK-18862 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Reporter: Yanbo Liang > > SparkR mllib.R is getting bigger as we add more ML wrappers, I'd like to > split it into multiple files to make us easy to maintain: > * mllibClassification.R > * mllibRegression.R > * mllibClustering.R > * mllibFeature.R > or: > * mllib/classification.R > * mllib/regression.R > * mllib/clustering.R > * mllib/features.R > For R convention, it's more prefer the first way. And I'm not sure whether R > supports the second organized way (will check later). Please let me know your > preference. I think the start of a new release cycle is a good opportunity to > do this, since it will involves less conflicts. If this proposal was > approved, I can work on it. > cc [~felixcheung] [~josephkb] [~mengxr] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16552) Store the Inferred Schemas into External Catalog Tables when Creating Tables
[ https://issues.apache.org/jira/browse/SPARK-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784183#comment-15784183 ] Xiao Li commented on SPARK-16552: - [~yhuai] Yeah, see the discussion in https://github.com/apache/spark/pull/15983#issuecomment-267836485. I think we need to document the behavior changes. > Store the Inferred Schemas into External Catalog Tables when Creating Tables > > > Key: SPARK-16552 > URL: https://issues.apache.org/jira/browse/SPARK-16552 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Labels: release_notes, releasenotes > Fix For: 2.1.0 > > > Currently, in Spark SQL, the initial creation of schema can be classified > into two groups. It is applicable to both Hive tables and Data Source tables: > Group A. Users specify the schema. > Case 1 CREATE TABLE AS SELECT: the schema is determined by the result schema > of the SELECT clause. For example, > {noformat} > CREATE TABLE tab STORED AS TEXTFILE > AS SELECT * from input > {noformat} > Case 2 CREATE TABLE: users explicitly specify the schema. For example, > {noformat} > CREATE TABLE jsonTable (_1 string, _2 string) > USING org.apache.spark.sql.json > {noformat} > Group B. Spark SQL infer the schema at runtime. > Case 3 CREATE TABLE. Users do not specify the schema but the path to the file > location. For example, > {noformat} > CREATE TABLE jsonTable > USING org.apache.spark.sql.json > OPTIONS (path '${tempDir.getCanonicalPath}') > {noformat} > Now, Spark SQL does not store the inferred schema in the external catalog for > the cases in Group B. When users refreshing the metadata cache, accessing the > table at the first time after (re-)starting Spark, Spark SQL will infer the > schema and store the info in the metadata cache for improving the performance > of subsequent metadata requests. However, the runtime schema inference could > cause undesirable schema changes after each reboot of Spark. > It is desirable to store the inferred schema in the external catalog when > creating the table. When users intend to refresh the schema, they issue > `REFRESH TABLE`. Spark SQL will infer the schema again based on the > previously specified table location and update/refresh the schema in the > external catalog and metadata cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784174#comment-15784174 ] Apache Spark commented on SPARK-19012: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/16427 > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19012: Assignee: Apache Spark > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra >Assignee: Apache Spark > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19012: Assignee: (was: Apache Spark) > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784169#comment-15784169 ] Dongjoon Hyun commented on SPARK-19012: --- In API docs and many places, `createOrReplaceTempView` was assumed not to throw any Exceptions. It seems we need to discuss on my PR. > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784083#comment-15784083 ] Dongjoon Hyun commented on SPARK-19012: --- Thank you for decision. Yep. I'll make the PR like that. > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784074#comment-15784074 ] Herman van Hovell edited comment on SPARK-19012 at 12/29/16 12:21 AM: -- Yeah, you have a point there. I was wondering if we would hit an issue here. The question what we want to support: * SQL compatibility. This would be one of the more common use cases. In that case it really does not make sense to support an identifier like '1', because that would fail in SQL. * As much flexibility as you want. [~jzijlstra] could you explain how you are using this? [~dongjoon] lets just make the exception better for now. was (Author: hvanhovell): Yeah, you have a point there. I was wondering if we would hit an issue here. The question what we want to support: * SQL compatibility. This would be one of the more common use cases. In that case it really does not make sense to support an identifier like '1', because that would fail in SQL. * As much flexibility as you want. [~jzijlstra] could you explain where you are using this. [~dongjoon] lets just make the exception better for now. > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784074#comment-15784074 ] Herman van Hovell commented on SPARK-19012: --- Yeah, you have a point there. I was wondering if we would hit an issue here. The question what we want to support: * SQL compatibility. This would be one of the more common use cases. In that case it really does not make sense to support an identifier like '1', because that would fail in SQL. * As much flexibility as you want. [~jzijlstra] could you explain where you are using this. [~dongjoon] lets just make the exception better for now. > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784062#comment-15784062 ] Dongjoon Hyun commented on SPARK-19012: --- Ur, actually, we already support `createOrReplaceTempView("`1`")`. > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784056#comment-15784056 ] Dongjoon Hyun commented on SPARK-19012: --- BTW, [~hvanhovell]. I found the existing related issue and testcases. {code} test("SPARK-12982: Add table name validation in temp table registration") { val df = Seq("foo", "bar").map(Tuple1.apply).toDF("col") // invalid table name test as below intercept[AnalysisException](df.createOrReplaceTempView("t~")) // valid table name test as below df.createOrReplaceTempView("table1") // another invalid table name test as below intercept[AnalysisException](df.createOrReplaceTempView("#$@sum")) // another invalid table name test as below intercept[AnalysisException](df.createOrReplaceTempView("table!#")) } {code} To be consistent with this, we should throw AnalysisException on `createOrReplaceTempView("1")`. So, what we want here is to support `createOrReplaceTempView("`1`")`. Did I understand clearly? > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783990#comment-15783990 ] Dongjoon Hyun commented on SPARK-19012: --- No problem. However, we need to raise AnalysisException on empty table table still, ``. > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783992#comment-15783992 ] Dongjoon Hyun commented on SPARK-19012: --- +1 > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783988#comment-15783988 ] Herman van Hovell commented on SPARK-19012: --- Yeah, maybe a bit more subtle than that (we need to escape backticks in the name). > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783983#comment-15783983 ] Dongjoon Hyun commented on SPARK-19012: --- Oh, you mean always wrap the name with backticks right? > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783966#comment-15783966 ] Herman van Hovell edited comment on SPARK-19012 at 12/28/16 11:23 PM: -- [~dongjoon] Could make a PR that puts the name in backticks instead? That is a bit more friendly to the end user. Or do you think we will break stuff, if we do? was (Author: hvanhovell): [~dongjoon] Could make a PR that puts the code in backticks instead? That is a bit more friendly to the end user. Or do you think we will break stuff, if we do? > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783966#comment-15783966 ] Herman van Hovell commented on SPARK-19012: --- [~dongjoon] Could make a PR that puts the code in backticks instead? That is a bit more friendly to the end user. Or do you think we will break stuff, if we do? > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783958#comment-15783958 ] Dongjoon Hyun commented on SPARK-19012: --- Hi, [~hvanhovell] and [~jzijlstra]. I'll make a PR to raise `AnalysisException` instead. > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results
[ https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783937#comment-15783937 ] Herman van Hovell commented on SPARK-19017: --- [~nsyca] Why is this incorrect? If I rewrite the NOT IN into a WHERE statement this would become: {noformat} select * from t1 where (a1 <> 1 AND b1 <> NULL) {noformat} There WHERE would evaluate to NULL, and it would never return a result. > NOT IN subquery with more than one column may return incorrect results > -- > > Key: SPARK-19017 > URL: https://issues.apache.org/jira/browse/SPARK-19017 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0 >Reporter: Nattavut Sutyanyong > > When putting more than one column in the NOT IN, the query may not return > correctly if there is a null data. We can demonstrate the problem with the > following data set and query: > {code} > Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1") > Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2") > sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show > +---+---+ > | a1| b1| > +---+---+ > +---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17847) Reduce shuffled data size of GaussianMixture & copy the implementation from mllib to ml
[ https://issues.apache.org/jira/browse/SPARK-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-17847: -- Target Version/s: 2.2.0 > Reduce shuffled data size of GaussianMixture & copy the implementation from > mllib to ml > --- > > Key: SPARK-17847 > URL: https://issues.apache.org/jira/browse/SPARK-17847 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Yanbo Liang >Assignee: Yanbo Liang > > Copy {{GaussianMixture}} implementation from mllib to ml, then we can add new > features to it. > I left mllib {{GaussianMixture}} untouched, unlike some other algorithms to > wrap the ml implementation. For the following reasons: > * mllib {{GaussianMixture}} allow k == 1, but ml does not. > * mllib {{GaussianMixture}} supports setting initial model, but ml does not > support currently. (We will definitely add this feature for ml in the future) > Meanwhile, There is a big performance improvement for {{GaussianMixture}} in > this task. Since the covariance matrix of multivariate gaussian distribution > is symmetric, we can only store the upper triangular part of the matrix and > it will greatly reduce the shuffled data size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
[ https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783821#comment-15783821 ] Devaraj K commented on SPARK-15359: --- [~yu2003w], seems you are also facing the same issue which I mentioned in the description, I already created PR for this issue, do you have chance to try with the PR available and let me know your feedback? > Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() > --- > > Key: SPARK-15359 > URL: https://issues.apache.org/jira/browse/SPARK-15359 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during > the successful registration but if the mesosDriver.run() returns > DRIVER_ABORTED status after the successful register then there is no action > for the status and the thread will be terminated. > I think we need to throw the exception and shutdown the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16552) Store the Inferred Schemas into External Catalog Tables when Creating Tables
[ https://issues.apache.org/jira/browse/SPARK-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783705#comment-15783705 ] Yin Huai commented on SPARK-16552: -- [~smilegator] [~cloud_fan] i think we will not do partitioning discovery after SPARK-17861 by default right? Can you help me check if we still need to write anything about this in the release notes? > Store the Inferred Schemas into External Catalog Tables when Creating Tables > > > Key: SPARK-16552 > URL: https://issues.apache.org/jira/browse/SPARK-16552 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Labels: release_notes, releasenotes > Fix For: 2.1.0 > > > Currently, in Spark SQL, the initial creation of schema can be classified > into two groups. It is applicable to both Hive tables and Data Source tables: > Group A. Users specify the schema. > Case 1 CREATE TABLE AS SELECT: the schema is determined by the result schema > of the SELECT clause. For example, > {noformat} > CREATE TABLE tab STORED AS TEXTFILE > AS SELECT * from input > {noformat} > Case 2 CREATE TABLE: users explicitly specify the schema. For example, > {noformat} > CREATE TABLE jsonTable (_1 string, _2 string) > USING org.apache.spark.sql.json > {noformat} > Group B. Spark SQL infer the schema at runtime. > Case 3 CREATE TABLE. Users do not specify the schema but the path to the file > location. For example, > {noformat} > CREATE TABLE jsonTable > USING org.apache.spark.sql.json > OPTIONS (path '${tempDir.getCanonicalPath}') > {noformat} > Now, Spark SQL does not store the inferred schema in the external catalog for > the cases in Group B. When users refreshing the metadata cache, accessing the > table at the first time after (re-)starting Spark, Spark SQL will infer the > schema and store the info in the metadata cache for improving the performance > of subsequent metadata requests. However, the runtime schema inference could > cause undesirable schema changes after each reboot of Spark. > It is desirable to store the inferred schema in the external catalog when > creating the table. When users intend to refresh the schema, they issue > `REFRESH TABLE`. Spark SQL will infer the schema again based on the > previously specified table location and update/refresh the schema in the > external catalog and metadata cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results
[ https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783653#comment-15783653 ] Nattavut Sutyanyong commented on SPARK-19017: - The semantics of the NOT IN for multiple columns T1(a1, b1, ... ) NOT IN T2(a2, b2, ...) is # For any rows of T1 if a1 <> ALL (T2.a2), those rows are returned. # For any rows of T1 if a1 = ANY (T2.a2), take the qualified rows from T1 and T2 and compare the values from the next pair of columns with the similar condition in 1. -- if b1 <> ALL (T2.b2), those rows are returned. # Repeat the steps until the last pair in the column list. > NOT IN subquery with more than one column may return incorrect results > -- > > Key: SPARK-19017 > URL: https://issues.apache.org/jira/browse/SPARK-19017 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0 >Reporter: Nattavut Sutyanyong > > When putting more than one column in the NOT IN, the query may not return > correctly if there is a null data. We can demonstrate the problem with the > following data set and query: > {code} > Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1") > Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2") > sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show > +---+---+ > | a1| b1| > +---+---+ > +---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18958) SparkR should support toJSON on DataFrame
[ https://issues.apache.org/jira/browse/SPARK-18958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-18958. -- Resolution: Fixed Target Version/s: 2.2.0 > SparkR should support toJSON on DataFrame > - > > Key: SPARK-18958 > URL: https://issues.apache.org/jira/browse/SPARK-18958 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.0 >Reporter: Felix Cheung >Assignee: Felix Cheung >Priority: Minor > > It makes it easier to interop with other component (esp. since R does not > have json support built in) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results
Nattavut Sutyanyong created SPARK-19017: --- Summary: NOT IN subquery with more than one column may return incorrect results Key: SPARK-19017 URL: https://issues.apache.org/jira/browse/SPARK-19017 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0 Reporter: Nattavut Sutyanyong When putting more than one column in the NOT IN, the query may not return correctly if there is a null data. We can demonstrate the problem with the following data set and query: {code} Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1") Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2") sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show +---+---+ | a1| b1| +---+---+ +---+---+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18669) Update Apache docs regard watermarking in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783633#comment-15783633 ] Apache Spark commented on SPARK-18669: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/16425 > Update Apache docs regard watermarking in Structured Streaming > -- > > Key: SPARK-18669 > URL: https://issues.apache.org/jira/browse/SPARK-18669 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783543#comment-15783543 ] Josh Bacon edited comment on SPARK-18737 at 12/28/16 7:39 PM: -- Hi Sean, We've perform more tests and are experiencing the same issues with the following minimal code reproduction. (Spark 2.0.2 w/ prebuilt hadoop 2.7): {code:title=Bar.scala|borderStyle=solid} import org.apache.spark.sql.SparkSession import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.StreamingContext import org.apache.spark.streaming.kinesis.KinesisUtils import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream object StreamingFromKinesisTest { def main(args: Array[String]) { val endpointUrl = "https://kinesis.us-west-2.amazonaws.com";; val streamName = args(0); val appName = args(1); //DynamoDB name val region = "us-west-2"; val sparkSession = SparkSession.builder.appName("StreamingFromKinesisTest").getOrCreate(); val batchInterval = Seconds(10); val streamingContext = new StreamingContext(sparkSession.sparkContext, batchInterval); val kinesisStreams = (0 until 2).map { _ => KinesisUtils.createStream(streamingContext,appName,streamName,endpointUrl,region,InitialPositionInStream.TRIM_HORIZON,batchInterval,StorageLevel.MEMORY_AND_DISK_2); }; val streamOfArrayBytes = streamingContext.union(kinesisStreams); val streamStrings = streamOfArrayBytes.map(arrayBytes => new String(arrayBytes)); streamStrings.foreachRDD((rddString, timestamp) => { println(timestamp); if (!rddString.isEmpty()) { println("Success!"); } }); streamingContext.start(); streamingContext.awaitTerminationOrTimeout(600) } } {code} {panel:title=Executor Log Snippet|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} 16/12/28 11:02:40 INFO BlockManager: Removing RDD 15 16/12/28 11:02:40 INFO BlockManager: Removing RDD 13 16/12/28 11:02:40 INFO BlockManager: Removing RDD 14 16/12/28 11:02:53 INFO CoarseGrainedExecutorBackend: Got assigned task 72 16/12/28 11:02:53 INFO Executor: Running task 0.0 in stage 4.0 (TID 72) 16/12/28 11:02:53 INFO TorrentBroadcast: Started reading broadcast variable 4 16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 1762.0 B, free 366.3 MB) 16/12/28 11:02:53 INFO TorrentBroadcast: Reading broadcast variable 4 took 10 ms 16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 2.6 KB, free 366.3 MB) 16/12/28 11:02:53 INFO TransportClientFactory: Successfully created connection to /172.21.50.111:5000 after 22 ms (21 ms spent in bootstraps) 16/12/28 11:02:54 INFO BlockManager: Found block input-1-1482951722353 remotely 16/12/28 11:02:54 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 72) com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 13994 at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899) at org.apache.spark.SparkContext$$a
[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783543#comment-15783543 ] Josh Bacon commented on SPARK-18737: Hi Sean, We've perform a more tests and are experiencing the same issues with the following minimal code reproduction. (Spark 2.0.2 w/ prebuilt hadoop 2.7): {code:title=Bar.scala|borderStyle=solid} import org.apache.spark.sql.SparkSession import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.StreamingContext import org.apache.spark.streaming.kinesis.KinesisUtils import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream object StreamingFromKinesisTest { def main(args: Array[String]) { val endpointUrl = "https://kinesis.us-west-2.amazonaws.com";; val streamName = args(0); val appName = args(1); //DynamoDB name val region = "us-west-2"; val sparkSession = SparkSession.builder.appName("StreamingFromKinesisTest").getOrCreate(); val batchInterval = Seconds(10); val streamingContext = new StreamingContext(sparkSession.sparkContext, batchInterval); val kinesisStreams = (0 until 2).map { _ => KinesisUtils.createStream(streamingContext,appName,streamName,endpointUrl,region,InitialPositionInStream.TRIM_HORIZON,batchInterval,StorageLevel.MEMORY_AND_DISK_2); }; val streamOfArrayBytes = streamingContext.union(kinesisStreams); val streamStrings = streamOfArrayBytes.map(arrayBytes => new String(arrayBytes)); streamStrings.foreachRDD((rddString, timestamp) => { println(timestamp); if (!rddString.isEmpty()) { println("Success!"); } }); streamingContext.start(); streamingContext.awaitTerminationOrTimeout(600) } } {code} {panel:title=Executor Log Snippet|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} 16/12/28 11:02:40 INFO BlockManager: Removing RDD 15 16/12/28 11:02:40 INFO BlockManager: Removing RDD 13 16/12/28 11:02:40 INFO BlockManager: Removing RDD 14 16/12/28 11:02:53 INFO CoarseGrainedExecutorBackend: Got assigned task 72 16/12/28 11:02:53 INFO Executor: Running task 0.0 in stage 4.0 (TID 72) 16/12/28 11:02:53 INFO TorrentBroadcast: Started reading broadcast variable 4 16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 1762.0 B, free 366.3 MB) 16/12/28 11:02:53 INFO TorrentBroadcast: Reading broadcast variable 4 took 10 ms 16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 2.6 KB, free 366.3 MB) 16/12/28 11:02:53 INFO TransportClientFactory: Successfully created connection to /172.21.50.111:5000 after 22 ms (21 ms spent in bootstraps) 16/12/28 11:02:54 INFO BlockManager: Found block input-1-1482951722353 remotely 16/12/28 11:02:54 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 72) com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 13994 at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
[jira] [Commented] (SPARK-19016) Document scalable partition handling feature in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783538#comment-15783538 ] Apache Spark commented on SPARK-19016: -- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/16424 > Document scalable partition handling feature in the programming guide > - > > Key: SPARK-19016 > URL: https://issues.apache.org/jira/browse/SPARK-19016 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.1.0, 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Minor > > Currently, we only mention this in the migration guide. Should also document > it in the programming guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19016) Document scalable partition handling feature in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19016: Assignee: Cheng Lian (was: Apache Spark) > Document scalable partition handling feature in the programming guide > - > > Key: SPARK-19016 > URL: https://issues.apache.org/jira/browse/SPARK-19016 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.1.0, 2.2.0 >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Minor > > Currently, we only mention this in the migration guide. Should also document > it in the programming guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19016) Document scalable partition handling feature in the programming guide
[ https://issues.apache.org/jira/browse/SPARK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19016: Assignee: Apache Spark (was: Cheng Lian) > Document scalable partition handling feature in the programming guide > - > > Key: SPARK-19016 > URL: https://issues.apache.org/jira/browse/SPARK-19016 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.1.0, 2.2.0 >Reporter: Cheng Lian >Assignee: Apache Spark >Priority: Minor > > Currently, we only mention this in the migration guide. Should also document > it in the programming guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10878) Race condition when resolving Maven coordinates via Ivy
[ https://issues.apache.org/jira/browse/SPARK-10878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783531#comment-15783531 ] Andrew Snare commented on SPARK-10878: -- I see this with Spark 2.0 as well. There doesn't appear to be a good workaround, although I assume avoiding {{--packages}} means the Ivy cache isn't used and therefore the conflict can't occur. > Race condition when resolving Maven coordinates via Ivy > --- > > Key: SPARK-10878 > URL: https://issues.apache.org/jira/browse/SPARK-10878 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Ryan Williams >Priority: Minor > > I've recently been shell-scripting the creation of many concurrent > Spark-on-YARN apps and observing a fraction of them to fail with what I'm > guessing is a race condition in their Maven-coordinate resolution. > For example, I might spawn an app for each path in file {{paths}} with the > following shell script: > {code} > cat paths | parallel "$SPARK_HOME/bin/spark-submit foo.jar {}" > {code} > When doing this, I observe some fraction of the spawned jobs to fail with > errors like: > {code} > :: retrieving :: org.apache.spark#spark-submit-parent > confs: [default] > Exception in thread "main" java.lang.RuntimeException: problem during > retrieve of org.apache.spark#spark-submit-parent: java.text.ParseException: > failed to parse report: > /hpc/users/willir31/.ivy2/cache/org.apache.spark-spark-submit-parent-default.xml: > Premature end of file. > at > org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:249) > at > org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:83) > at org.apache.ivy.Ivy.retrieve(Ivy.java:551) > at > org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1006) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.text.ParseException: failed to parse report: > /hpc/users/willir31/.ivy2/cache/org.apache.spark-spark-submit-parent-default.xml: > Premature end of file. > at > org.apache.ivy.plugins.report.XmlReportParser.parse(XmlReportParser.java:293) > at > org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:329) > at > org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:118) > ... 7 more > Caused by: org.xml.sax.SAXParseException; Premature end of file. > at > org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown > Source) > at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown > Source) > at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) > {code} > The more apps I try to launch simultaneously, the greater fraction of them > seem to fail with this or similar errors; a batch of ~10 will usually work > fine, a batch of 15 will see a few failures, and a batch of ~60 will have > dozens of failures. > [This gist shows 11 recent failures I > observed|https://gist.github.com/ryan-williams/648bff70e518de0c7c84]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19016) Document scalable partition handling feature in the programming guide
Cheng Lian created SPARK-19016: -- Summary: Document scalable partition handling feature in the programming guide Key: SPARK-19016 URL: https://issues.apache.org/jira/browse/SPARK-19016 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 2.1.0, 2.2.0 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Minor Currently, we only mention this in the migration guide. Should also document it in the programming guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18966) NOT IN subquery with correlated expressions may return incorrect result
[ https://issues.apache.org/jira/browse/SPARK-18966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783457#comment-15783457 ] Nattavut Sutyanyong edited comment on SPARK-18966 at 12/28/16 6:40 PM: --- Considering the following subquery: {code} select * from t1 where a1 not in (select a2 from t2 where t2.b2 = t1.b1) {code} There are a number of scenarios to consider: - 1. When the correlated predicate yields a match (i.e., T2.B2 = T1.B1) -- 1.1. When the NOT IN expression yields a match (i.e., T1.A1 = T2.A2) -- 1.2. When the NOT IN expression yields no match (i.e., T1.A1 = T2.A2 returns false) -- 1.3. When T1.A1 is null -- 1.4. When T2.A2 is null --- 1.4.1. When T1.A1 is not null --- 1.4.2. When T1.A1 is null - 2. When the correlated predicate yields no match (i.e., T2.B2 = T1.B1 is false or unknown) -- 2.1. When T2.B2 is null and T1.B1 is null -- 2.2. When T2.B2 is null and T1.B1 is not null -- 2.3. When the value of T1.B1 does not match any of T2.B2 {code} T1.A1 T1.B1 T2.A2 T2.B2 - - - - 1 1 1 1(1.1) 2 1 (1.2) null 1 (1.3) 1 3 null 3(1.4.1) null 3 (1.4.2) 1 null 1 null(2.1) null 2 (2.2 & 2.3) {code} We can divide the evaluation of the above correlated NOT IN subquery into 2 groups:- Group 1: The rows in T1 when there is a match from the correlated predicate (T1.B1 = T2.B2) In this case, the result of the subquery is not empty and the semantics of the NOT IN depends solely on the evaluation of the equality comparison of the columns of NOT IN, i.e., A1 = A2, which says # If T1.A1 is null, the row is filtered (1.3 and 1.4.2) # If T1.A1 = T2.A2, the row is filtered (1.1) # If T2.A2 is null, any rows of T1 in the same group (T1.B1 = T2.B2) is filtered (1.4.1 & 1.4.2) # Otherwise, the row is qualified. Hence, in this group, the result is the row from (1.2). Group 2: The rows in T1 when there is no match from the correlated predicate (T1.B1 = T2.B2) In this case, all the rows in T1, including the rows where T1.A1, are qualified because the subquery returns an empty set and by the semantics of the NOT IN, all rows from the parent side qualifies as the result set, that is, the rows from (2.1, 2.2 and 2.3). In conclusion, the correct result set of the above query is {code} T1.A1 T1.B1 - - 2 1(1.2) 1 null(2.1) null 2(2.2 & 2.3) {code} was (Author: nsyca): Considering the following subquery: {code} select * from t1 where a1 not in (select a2 from t2 where t2.b2 = t1.b1) {code} There are a number of scenarios to consider: - 1. When the correlated predicate yields a match (i.e., T2.B2 = T1.B1) -- 1.1. When the NOT IN expression yields a match (i.e., T1.A1 = T2.A2) -- 1.2. When the NOT IN expression yields no match (i.e., T1.A1 = T2.A2 returns false) -- 1.3. When T1.A1 is null -- 1.4. When T2.A2 is null --- 1.4.1. When T1.A1 is not null --- 1.4.2. When T1.A1 is null - 2. When the correlated predicate yields no match (i.e., T2.B2 = T1.B1 is false or unknown) -- 2.1. When T2.B2 is null and T1.B1 is null -- 2.2. When T2.B2 is null and T1.B1 is not null -- 2.3. When the value of T1.B1 does not match any of T2.B2 {code} T1.A1 T1.B1 T2.A2 T2.B2 - - - - 1 1 1 1(1.1) 2 1 (1.2) null 1 (1.3) 1 3 null 3(1.4.1) null 3 (1.4.2) 1 null 1 null(2.1) null 2 (2.2 & 2.3) {code} We can divide the evaluation of the above correlated NOT IN subquery into 2 groups:- Group 1: The rows in T1 when there is a match from the correlated predicate (T1.B1 = T2.B2) In this case, the result of the subquery is not empty and the semantics of the NOT IN depends solely on the evaluation of the equality comparison of the columns of NOT IN, i.e., A1 = A2, which says # If T1.A1 is null, the row is filtered (1.3 and 1.4.2) # If T1.A1 = T2.A2, the row is filtered (1.1) # If T2.A2 is null, any rows of T1 in the same group (T1.B1 = T2.B2) is filtered (1.4.1 & 1.4.2) # Otherwise, the row is qualified. Hence, in this group, the result is the row from (1.2). Group 2: The rows in T1 when there is no match from the correlated predicate (T1.B1 = T2.B2) In this case, all the rows in T1, including the rows where T1.A1, are qualified because the subquery returns an empty set and by the semantics of the NOT IN, all rows from the parent side qualifies as the result set, that is, the rows fr
[jira] [Commented] (SPARK-18966) NOT IN subquery with correlated expressions may return incorrect result
[ https://issues.apache.org/jira/browse/SPARK-18966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783457#comment-15783457 ] Nattavut Sutyanyong commented on SPARK-18966: - Considering the following subquery: {code} select * from t1 where a1 not in (select a2 from t2 where t2.b2 = t1.b1) {code} There are a number of scenarios to consider: - 1. When the correlated predicate yields a match (i.e., T2.B2 = T1.B1) -- 1.1. When the NOT IN expression yields a match (i.e., T1.A1 = T2.A2) -- 1.2. When the NOT IN expression yields no match (i.e., T1.A1 = T2.A2 returns false) -- 1.3. When T1.A1 is null -- 1.4. When T2.A2 is null --- 1.4.1. When T1.A1 is not null --- 1.4.2. When T1.A1 is null - 2. When the correlated predicate yields no match (i.e., T2.B2 = T1.B1 is false or unknown) -- 2.1. When T2.B2 is null and T1.B1 is null -- 2.2. When T2.B2 is null and T1.B1 is not null -- 2.3. When the value of T1.B1 does not match any of T2.B2 {code} T1.A1 T1.B1 T2.A2 T2.B2 - - - - 1 1 1 1(1.1) 2 1 (1.2) null 1 (1.3) 1 3 null 3(1.4.1) null 3 (1.4.2) 1 null 1 null(2.1) null 2 (2.2 & 2.3) {code} We can divide the evaluation of the above correlated NOT IN subquery into 2 groups:- Group 1: The rows in T1 when there is a match from the correlated predicate (T1.B1 = T2.B2) In this case, the result of the subquery is not empty and the semantics of the NOT IN depends solely on the evaluation of the equality comparison of the columns of NOT IN, i.e., A1 = A2, which says # If T1.A1 is null, the row is filtered (1.3 and 1.4.2) # If T1.A1 = T2.A2, the row is filtered (1.1) # If T2.A2 is null, any rows of T1 in the same group (T1.B1 = T2.B2) is filtered (1.4.1 & 1.4.2) # Otherwise, the row is qualified. Hence, in this group, the result is the row from (1.2). Group 2: The rows in T1 when there is no match from the correlated predicate (T1.B1 = T2.B2) In this case, all the rows in T1, including the rows where T1.A1, are qualified because the subquery returns an empty set and by the semantics of the NOT IN, all rows from the parent side qualifies as the result set, that is, the rows from (2.1, 2.2 and 2.3). In conclusion, the correct result set of the above query is {code} T1.A1 T1.B1 - - 2 1(1.2) 1 null(2.1) null 2(2.2 & 2.3) {code} > NOT IN subquery with correlated expressions may return incorrect result > --- > > Key: SPARK-18966 > URL: https://issues.apache.org/jira/browse/SPARK-18966 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Nattavut Sutyanyong > Labels: correctness > > {code} > Seq((1, 2)).toDF("a1", "b1").createOrReplaceTempView("t1") > Seq[(java.lang.Integer, java.lang.Integer)]((1, null)).toDF("a2", > "b2").createOrReplaceTempView("t2") > // The expected result is 1 row of (1,2) as shown in the next statement. > sql("select * from t1 where a1 not in (select a2 from t2 where b2 = b1)").show > +---+---+ > | a1| b1| > +---+---+ > +---+---+ > sql("select * from t1 where a1 not in (select a2 from t2 where b2 = 2)").show > +---+---+ > | a1| b1| > +---+---+ > | 1| 2| > +---+---+ > {code} > The two SQL statements above should return the same result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3246) Support weighted SVMWithSGD for classification of unbalanced dataset
[ https://issues.apache.org/jira/browse/SPARK-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783435#comment-15783435 ] Sheridan Rawlins commented on SPARK-3246: - Hey, I have a solution that just uses liblinear to do the work. Not sure if that would be acceptable to commit the added dependencies, but if it is, I also did the spark.ml port to gain all of the cross validation / hypertuning goodness -SCR Sent from my iPhone > Support weighted SVMWithSGD for classification of unbalanced dataset > > > Key: SPARK-3246 > URL: https://issues.apache.org/jira/browse/SPARK-3246 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 0.9.0, 1.0.2 >Reporter: mahesh bhole > > Please support weighted SVMWithSGD for binary classification of unbalanced > dataset.Though other options like undersampling or oversampling can be > used,It will be good if we can have a way to assign weights to minority > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17999) Add getPreferredLocations for KafkaSourceRDD
[ https://issues.apache.org/jira/browse/SPARK-17999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17999: - Component/s: (was: DStreams) (was: SQL) Structured Streaming > Add getPreferredLocations for KafkaSourceRDD > > > Key: SPARK-17999 > URL: https://issues.apache.org/jira/browse/SPARK-17999 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Reporter: Saisai Shao >Assignee: Saisai Shao >Priority: Minor > Fix For: 2.0.2, 2.1.0 > > > The newly implemented Structured Streaming KafkaSource did calculate the > preferred locations for each topic partition, but didn't offer this > information through RDD's {{getPreferredLocations}} method. So here propose > to add this method in {{KafkaSourceRDD}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15698) Ability to remove old metadata for structure streaming MetadataLog
[ https://issues.apache.org/jira/browse/SPARK-15698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-15698: - Component/s: (was: DStreams) (was: SQL) Structured Streaming > Ability to remove old metadata for structure streaming MetadataLog > -- > > Key: SPARK-15698 > URL: https://issues.apache.org/jira/browse/SPARK-15698 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Reporter: Saisai Shao >Assignee: Saisai Shao > Fix For: 2.0.1, 2.1.0 > > > Current MetadataLog lacks the ability to remove old checkpoint file, we'd > better add this functionality to the MetadataLog and honor it in the place > where MetadataLog is used, that will reduce unnecessary small files in the > long running scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16963) Change Source API so that sources do not need to keep unbounded state
[ https://issues.apache.org/jira/browse/SPARK-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-16963: - Component/s: (was: DStreams) Structured Streaming > Change Source API so that sources do not need to keep unbounded state > - > > Key: SPARK-16963 > URL: https://issues.apache.org/jira/browse/SPARK-16963 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.0.0, 2.0.1 >Reporter: Frederick Reiss >Assignee: Frederick Reiss > Fix For: 2.0.2, 2.1.0 > > > The version of the Source API in Spark 2.0.0 defines a single getBatch() > method for fetching records from the source, with the following Scaladoc > comments defining the semantics: > {noformat} > /** > * Returns the data that is between the offsets (`start`, `end`]. When > `start` is `None` then > * the batch should begin with the first available record. This method must > always return the > * same data for a particular `start` and `end` pair. > */ > def getBatch(start: Option[Offset], end: Offset): DataFrame > {noformat} > These semantics mean that a Source must retain all past history for the > stream that it backs. Further, a Source is also required to retain this data > across restarts of the process where the Source is instantiated, even when > the Source is restarted on a different machine. > These restrictions make it difficult to implement the Source API, as any > implementation requires potentially unbounded amounts of distributed storage. > See the mailing list thread at > [http://apache-spark-developers-list.1001551.n3.nabble.com/Source-API-requires-unbounded-distributed-storage-td18551.html] > for more information. > This JIRA will cover augmenting the Source API with an additional callback > that will allow Structured Streaming scheduler to notify the source when it > is safe to discard buffered data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17153) [Structured streams] readStream ignores partition columns
[ https://issues.apache.org/jira/browse/SPARK-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17153: - Component/s: (was: DStreams) Structured Streaming > [Structured streams] readStream ignores partition columns > - > > Key: SPARK-17153 > URL: https://issues.apache.org/jira/browse/SPARK-17153 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.0 >Reporter: Dmitri Carpov >Assignee: Liang-Chi Hsieh > Labels: release_notes, releasenotes > Fix For: 2.0.2, 2.1.0 > > > When parquet files are persisted using partitions, spark's `readStream` > returns data with all `null`s for the partitioned columns. > For example: > {noformat} > case class A(id: Int, value: Int) > val data = spark.createDataset(Seq( > A(1, 1), > A(2, 2), > A(2, 3)) > ) > val url = "/mnt/databricks/test" > data.write.partitionBy("id").parquet(url) > {noformat} > when data is read as stream: > {noformat} > spark.readStream.schema(spark.read.load(url).schema).parquet(url) > {noformat} > it reads: > {noformat} > id, value > null, 1 > null, 2 > null, 3 > {noformat} > A possible reason is `readStream` reads parquet files directly but when those > are stored the columns they are partitioned by are excluded from the file > itself. In the given example the parquet files contain `value` information > only since `id` is partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17085) Documentation and actual code differs - Unsupported Operations
[ https://issues.apache.org/jira/browse/SPARK-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17085: - Component/s: (was: DStreams) Structured Streaming > Documentation and actual code differs - Unsupported Operations > -- > > Key: SPARK-17085 > URL: https://issues.apache.org/jira/browse/SPARK-17085 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 2.0.0 >Reporter: Samritti >Assignee: Jagadeesan A S >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > > Spark Stuctured Streaming doc in this link > https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations > mentions > >>>"Right outer join with a streaming Dataset on the right is not supported" > but the code here conveys a different/opposite error > https://github.com/apache/spark/blob/5545b791096756b07b3207fb3de13b68b9a37b00/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L114 > >>>"Right outer join with a streaming DataFrame/Dataset on the left is " + > "not supported" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17475) HDFSMetadataLog should not leak CRC files
[ https://issues.apache.org/jira/browse/SPARK-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17475: - Component/s: (was: DStreams) Structured Streaming > HDFSMetadataLog should not leak CRC files > - > > Key: SPARK-17475 > URL: https://issues.apache.org/jira/browse/SPARK-17475 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.0.1 >Reporter: Frederick Reiss >Assignee: Frederick Reiss > Fix For: 2.1.0 > > > When HDFSMetadataLog uses a log directory on a filesystem other than HDFS > (i.e. NFS or the driver node's local filesystem), the class leaves orphan > checksum (CRC) files in the log directory. The files have names that follow > the pattern "..[long UUID hex string].tmp.crc". These files exist because > HDFSMetaDataLog renames other temporary files without renaming the > corresponding checksum files. There is one CRC file per batch, so the > directory fills up quite quickly. > I'm not certain, but this problem might also occur on certain versions of the > HDFS APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17513) StreamExecution should discard unneeded metadata
[ https://issues.apache.org/jira/browse/SPARK-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17513: - Component/s: (was: DStreams) Structured Streaming > StreamExecution should discard unneeded metadata > > > Key: SPARK-17513 > URL: https://issues.apache.org/jira/browse/SPARK-17513 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Reporter: Frederick Reiss >Assignee: Frederick Reiss > Fix For: 2.0.1, 2.1.0 > > > The StreamExecution maintains a write-ahead log of batch metadata in order to > allow repeating previously in-flight batches if the driver is restarted. > StreamExecution does not garbage-collect or compact this log in any way. > Since the log is implemented with HDFSMetadataLog, these files will consume > memory on the HDFS NameNode. Specifically, each log file will consume about > 300 bytes of NameNode memory (150 bytes for the inode and 150 bytes for the > block of file contents; see > [https://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html]. > An application with a 100 msec batch interval will increase the NameNode's > heap usage by about 250MB per day. > There is also the matter of recovery. StreamExecution reads its entire log > when restarting. This read operation will be very expensive if the log > contains millions of entries spread over millions of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18152) CLONE - FileStreamSource should not track the list of seen files indefinitely
[ https://issues.apache.org/jira/browse/SPARK-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-18152: - Component/s: (was: DStreams) (was: SQL) Structured Streaming > CLONE - FileStreamSource should not track the list of seen files indefinitely > - > > Key: SPARK-18152 > URL: https://issues.apache.org/jira/browse/SPARK-18152 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Reporter: Sunil Kumar >Assignee: Peter Lee > Fix For: 2.0.1, 2.1.0 > > > FileStreamSource currently tracks all the files seen indefinitely, which > means it can run out of memory or overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18030) Flaky test: org.apache.spark.sql.streaming.FileStreamSourceSuite
[ https://issues.apache.org/jira/browse/SPARK-18030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-18030: - Component/s: (was: DStreams) Structured Streaming > Flaky test: org.apache.spark.sql.streaming.FileStreamSourceSuite > - > > Key: SPARK-18030 > URL: https://issues.apache.org/jira/browse/SPARK-18030 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Reporter: Davies Liu >Assignee: Shixiong Zhu > Fix For: 2.0.2, 2.1.0 > > > https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.streaming.FileStreamSourceSuite&test_name=when+schema+inference+is+turned+on%2C+should+read+partition+data -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18151) CLONE - MetadataLog should support purging old logs
[ https://issues.apache.org/jira/browse/SPARK-18151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-18151: - Component/s: (was: DStreams) (was: SQL) Structured Streaming > CLONE - MetadataLog should support purging old logs > --- > > Key: SPARK-18151 > URL: https://issues.apache.org/jira/browse/SPARK-18151 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Reporter: Sunil Kumar >Assignee: Peter Lee > Fix For: 2.0.1, 2.1.0 > > > This is a useful primitive operation to have to support checkpointing and > forgetting old logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18153) CLONE - Ability to remove old metadata for structure streaming MetadataLog
[ https://issues.apache.org/jira/browse/SPARK-18153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-18153: - Component/s: (was: DStreams) (was: SQL) Structured Streaming > CLONE - Ability to remove old metadata for structure streaming MetadataLog > -- > > Key: SPARK-18153 > URL: https://issues.apache.org/jira/browse/SPARK-18153 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Reporter: Sunil Kumar >Assignee: Saisai Shao > Fix For: 2.0.1, 2.1.0 > > > Current MetadataLog lacks the ability to remove old checkpoint file, we'd > better add this functionality to the MetadataLog and honor it in the place > where MetadataLog is used, that will reduce unnecessary small files in the > long running scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18156) CLONE - StreamExecution should discard unneeded metadata
[ https://issues.apache.org/jira/browse/SPARK-18156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-18156: - Component/s: (was: DStreams) Structured Streaming > CLONE - StreamExecution should discard unneeded metadata > > > Key: SPARK-18156 > URL: https://issues.apache.org/jira/browse/SPARK-18156 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Reporter: Sunil Kumar >Assignee: Frederick Reiss > Fix For: 2.0.1, 2.1.0 > > > The StreamExecution maintains a write-ahead log of batch metadata in order to > allow repeating previously in-flight batches if the driver is restarted. > StreamExecution does not garbage-collect or compact this log in any way. > Since the log is implemented with HDFSMetadataLog, these files will consume > memory on the HDFS NameNode. Specifically, each log file will consume about > 300 bytes of NameNode memory (150 bytes for the inode and 150 bytes for the > block of file contents; see > [https://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html]. > An application with a 100 msec batch interval will increase the NameNode's > heap usage by about 250MB per day. > There is also the matter of recovery. StreamExecution reads its entire log > when restarting. This read operation will be very expensive if the log > contains millions of entries spread over millions of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18154) CLONE - Change Source API so that sources do not need to keep unbounded state
[ https://issues.apache.org/jira/browse/SPARK-18154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-18154: - Component/s: (was: DStreams) Structured Streaming > CLONE - Change Source API so that sources do not need to keep unbounded state > - > > Key: SPARK-18154 > URL: https://issues.apache.org/jira/browse/SPARK-18154 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.0.0, 2.0.1 >Reporter: Sunil Kumar >Assignee: Frederick Reiss > Fix For: 2.0.2, 2.1.0 > > > The version of the Source API in Spark 2.0.0 defines a single getBatch() > method for fetching records from the source, with the following Scaladoc > comments defining the semantics: > {noformat} > /** > * Returns the data that is between the offsets (`start`, `end`]. When > `start` is `None` then > * the batch should begin with the first available record. This method must > always return the > * same data for a particular `start` and `end` pair. > */ > def getBatch(start: Option[Offset], end: Offset): DataFrame > {noformat} > These semantics mean that a Source must retain all past history for the > stream that it backs. Further, a Source is also required to retain this data > across restarts of the process where the Source is instantiated, even when > the Source is restarted on a different machine. > These restrictions make it difficult to implement the Source API, as any > implementation requires potentially unbounded amounts of distributed storage. > See the mailing list thread at > [http://apache-spark-developers-list.1001551.n3.nabble.com/Source-API-requires-unbounded-distributed-storage-td18551.html] > for more information. > This JIRA will cover augmenting the Source API with an additional callback > that will allow Structured Streaming scheduler to notify the source when it > is safe to discard buffered data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results
[ https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-16849: Attachment: de-duplicating subqueries.pdf Design doc v1 > Improve subquery execution by deduplicating the subqueries with the same > results > > > Key: SPARK-16849 > URL: https://issues.apache.org/jira/browse/SPARK-16849 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Liang-Chi Hsieh > Attachments: de-duplicating subqueries.pdf > > > The subqueries in SparkSQL will be run even they have the same physical plan > and output same results. We should be able to deduplicate these subqueries > which are referred in a query for many times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17772) Add helper testing methods for instance weighting
[ https://issues.apache.org/jira/browse/SPARK-17772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang resolved SPARK-17772. - Resolution: Fixed Fix Version/s: 2.2.0 > Add helper testing methods for instance weighting > - > > Key: SPARK-17772 > URL: https://issues.apache.org/jira/browse/SPARK-17772 > Project: Spark > Issue Type: Test > Components: ML >Reporter: Seth Hendrickson >Assignee: Seth Hendrickson >Priority: Minor > Fix For: 2.2.0 > > > More and more ML algos are accepting instance weights. We keep replicating > code to test instance weighting in every test suite, which will get out of > hand rather quickly. We can and should implement some generic instance weight > test helper methods so that we can reduce duplicated code and standardize > these tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17645) Add feature selector methods based on: False Discovery Rate (FDR) and Family Wise Error rate (FWE)
[ https://issues.apache.org/jira/browse/SPARK-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang resolved SPARK-17645. - Resolution: Fixed Fix Version/s: 2.2.0 > Add feature selector methods based on: False Discovery Rate (FDR) and Family > Wise Error rate (FWE) > -- > > Key: SPARK-17645 > URL: https://issues.apache.org/jira/browse/SPARK-17645 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Reporter: Peng Meng >Assignee: Peng Meng >Priority: Minor > Fix For: 2.2.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > Univariate feature selection works by selecting the best features based on > univariate statistical tests. > FDR and FWE are a popular univariate statistical test for feature selection. > In 2005, the Benjamini and Hochberg paper on FDR was identified as one of the > 25 most-cited statistical papers. The FDR uses the Benjamini-Hochberg > procedure in this PR. https://en.wikipedia.org/wiki/False_discovery_rate. > In statistics, FWE is the probability of making one or more false > discoveries, or type I errors, among all the hypotheses when performing > multiple hypotheses tests. > https://en.wikipedia.org/wiki/Family-wise_error_rate > We add FDR and FWE methods for ChiSqSelector in this PR, like it is > implemented in scikit-learn. > http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17642: Assignee: Apache Spark > support DESC FORMATTED TABLE COLUMN command to show column-level statistics > --- > > Key: SPARK-17642 > URL: https://issues.apache.org/jira/browse/SPARK-17642 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: Zhenhua Wang >Assignee: Apache Spark > > Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command. > Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. > We should resolve this jira after column-level statistics are supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17642: Assignee: (was: Apache Spark) > support DESC FORMATTED TABLE COLUMN command to show column-level statistics > --- > > Key: SPARK-17642 > URL: https://issues.apache.org/jira/browse/SPARK-17642 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: Zhenhua Wang > > Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command. > Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. > We should resolve this jira after column-level statistics are supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782943#comment-15782943 ] Apache Spark commented on SPARK-17642: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/16422 > support DESC FORMATTED TABLE COLUMN command to show column-level statistics > --- > > Key: SPARK-17642 > URL: https://issues.apache.org/jira/browse/SPARK-17642 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: Zhenhua Wang > > Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command. > Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. > We should resolve this jira after column-level statistics are supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19015) SQL request with transformation cannot be eecuted if not run first a scan table
lakhdar adil created SPARK-19015: Summary: SQL request with transformation cannot be eecuted if not run first a scan table Key: SPARK-19015 URL: https://issues.apache.org/jira/browse/SPARK-19015 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: lakhdar adil Hello, I have a spark streaming wich turn on kafka and send results to ElasticSearch. I have an union request between two tables: "statswithrowid" table and "queryes" table sqlContext.sql(s"select id, rowid,agentId,datecalcul,'KAFKA' as source from statswithrowid where id IN ($ids) and agentId = '$agent' UNION select id, rowid,agentId,datecalcul, 'ES' as source from queryes where agentId = '$agent'") This request cannot be executed lonely. Today i need to execute first those two requests so that my union request can be working fine. Please find below my two requests which must be launched first before union request : request on "statswithrowid " table : sqlContext.sql(s"select id, rowid,agentId,datecalcul,'KAFKA' as source from statswithrowid where id IN ($ids) and agentId = '$agent'").show() request on "queryes" table : sqlContext.sql(s"select id, rowid,agentId,datecalcul, 'ES' as source from queryes where agentId = '$agent'").show() For information : if i don't mention .show() on two requests to launch before union, nothing can work. Why i need to launch that first before making union request ? what is the best way to work with union request ? i try union with dataframe, and i have the same probleme. I look forward your reply. Thank you in advance Best regards, Adil LAKHDAR -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19014) support complex aggregate buffer in HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-19014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782927#comment-15782927 ] Apache Spark commented on SPARK-19014: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/16417 > support complex aggregate buffer in HashAggregateExec > - > > Key: SPARK-19014 > URL: https://issues.apache.org/jira/browse/SPARK-19014 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19014) support complex aggregate buffer in HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-19014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19014: Assignee: Wenchen Fan (was: Apache Spark) > support complex aggregate buffer in HashAggregateExec > - > > Key: SPARK-19014 > URL: https://issues.apache.org/jira/browse/SPARK-19014 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19014) support complex aggregate buffer in HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-19014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19014: Assignee: Apache Spark (was: Wenchen Fan) > support complex aggregate buffer in HashAggregateExec > - > > Key: SPARK-19014 > URL: https://issues.apache.org/jira/browse/SPARK-19014 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19014) support complex aggregate buffer in HashAggregateExec
Wenchen Fan created SPARK-19014: --- Summary: support complex aggregate buffer in HashAggregateExec Key: SPARK-19014 URL: https://issues.apache.org/jira/browse/SPARK-19014 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17642: - Description: Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command. Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. We should resolve this jira after column-level statistics are supported. was: Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. We should resolve this jira after column-level statistics are supported. > support DESC FORMATTED TABLE COLUMN command to show column-level statistics > --- > > Key: SPARK-17642 > URL: https://issues.apache.org/jira/browse/SPARK-17642 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: Zhenhua Wang > > Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command. > Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. > We should resolve this jira after column-level statistics are supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17642: - Description: Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. We should resolve this jira after column-level statistics are supported. was: Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. We should resolve this jira after column-level statistics including histograms are supported. > support DESC FORMATTED TABLE COLUMN command to show column-level statistics > --- > > Key: SPARK-17642 > URL: https://issues.apache.org/jira/browse/SPARK-17642 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: Zhenhua Wang > > Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. > We should resolve this jira after column-level statistics are supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18993) Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags
[ https://issues.apache.org/jira/browse/SPARK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-18993: - Assignee: Sean Owen > Unable to build/compile Spark in IntelliJ due to missing Scala deps in > spark-tags > - > > Key: SPARK-18993 > URL: https://issues.apache.org/jira/browse/SPARK-18993 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Xiao Li >Assignee: Sean Owen >Priority: Critical > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > After https://github.com/apache/spark/pull/16311 is merged, I am unable to > build it in my IntelliJ. Got the following compilation error: > {noformat} > Error:scalac: error while loading Object, Missing dependency 'object scala in > compiler mirror', required by > /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class) > Error:scalac: Error: object scala in compiler mirror not found. > scala.reflect.internal.MissingRequirementError: object scala in compiler > mirror not found. > at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17) > at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66) > at > scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162) > at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395) > at scala.tools.nsc.Global$Run.(Global.scala:1215) > at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105) > at xsbt.CachedCompiler0.run(CompilerInterface.scala:105) > at xsbt.CachedCompiler0.run(CompilerInterface.scala:94) > at xsbt.CompilerInterface.run(CompilerInterface.scala:22) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101) > at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47) > at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) > at > org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29) > at > org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26) > at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67) > at > org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24) > at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala) > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18993) Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags
[ https://issues.apache.org/jira/browse/SPARK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18993. --- Resolution: Fixed Fix Version/s: 2.2.0 2.0.3 2.1.1 Issue resolved by pull request 16418 [https://github.com/apache/spark/pull/16418] > Unable to build/compile Spark in IntelliJ due to missing Scala deps in > spark-tags > - > > Key: SPARK-18993 > URL: https://issues.apache.org/jira/browse/SPARK-18993 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Xiao Li >Priority: Critical > Fix For: 2.1.1, 2.0.3, 2.2.0 > > > After https://github.com/apache/spark/pull/16311 is merged, I am unable to > build it in my IntelliJ. Got the following compilation error: > {noformat} > Error:scalac: error while loading Object, Missing dependency 'object scala in > compiler mirror', required by > /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class) > Error:scalac: Error: object scala in compiler mirror not found. > scala.reflect.internal.MissingRequirementError: object scala in compiler > mirror not found. > at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17) > at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66) > at > scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162) > at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395) > at scala.tools.nsc.Global$Run.(Global.scala:1215) > at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105) > at xsbt.CachedCompiler0.run(CompilerInterface.scala:105) > at xsbt.CachedCompiler0.run(CompilerInterface.scala:94) > at xsbt.CompilerInterface.run(CompilerInterface.scala:22) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101) > at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47) > at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) > at > org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29) > at > org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26) > at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67) > at > org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24) > at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala) > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org