date:20161228

[jira] [Comment Edited] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Jork Zijlstra (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784796#comment-15784796
 ] 

Jork Zijlstra edited comment on SPARK-19012 at 12/29/16 7:56 AM:
-

Good to see that its already being discussed. 

MSSQL also has some limitation in tableOrViewNames which is described in the 
documentation. Maybe updating the annotation of the method would also be 
enough. Having an Exception with a clear reason would definitely already a fix.

[~hvanhovell]
We specify our queries inside a configuration not the code. So we have this in 
our config:
dataPath = "hdfs://"
dataQuery: "SELECT column1, column2 FROM \[TABLE] WHERE 1 = 1"

Since we have one SparkSession for the application and the tableOrViewName is 
coupled to that and we don't want to specify an extra config option for the 
tableOrViewname, I though I'd just use the hashcode from the dataquery as the 
tableOrViewName. Use that in the createOrReplaceTempView and replace \[TABLE] 
inside the query with that. 

{code}
val path = "hdfs://{path}"
val dataQuery = "SELECT * FROM [TABLE] LIMIT 1"

val tableOrViewName = "_" + Math.abs(path.hashCode).toString + 
Math.abs(qry.hashCode).toString

val df = sparkSession.read.orc(path)
df.createOrReplaceTempView(tableOrViewName)

val result = sparkSession.sqlContext.sql(qry.replace("[TABLE]", 
tableOrViewName)).collect
{code}

Later I want to check If the tableOrViewName has already been created and not 
call createOrReplaceTempView everytime, but this is just performance 
improvement.


was (Author: jzijlstra):
Good to see that its already being discussed. 

MSSQL also has some limitation in tableOrViewNames which is described in the 
documentation. Maybe updating the annotation of the method would also be 
enough. Having an Exception with a clear reason would definitely already a fix.

[~hvanhovell]
We specify our queries inside a configuration not the code. So we have this in 
our config:
dataPath = "hdfs://"
dataQuery: "SELECT column1, column2 FROM \[TABLE] WHERE 1 = 1"

Since we have one SparkSession for the application and the tableOrViewName is 
coupled to that and we don't want to specify an extra config option for the 
tableOrViewname, I though I'd just use the hashcode from the dataquery as the 
tableOrViewName. Use that in the createOrReplaceTempView and replace \[TABLE] 
inside the query with that. 

{code}
val path = "hdfs://{path}"
val dataQuery = "SELECT * FROM [TABLE] LIMIT 1"

val tableOrViewName = "_" + Math.abs(path.hashCode).toString + 
Math.abs(qry.hashCode).toString

val df = sparkSession.read.orc(path)
df.createOrReplaceTempView(tableOrViewName)

val result = sparkSession.sqlContext.sql(qry.replace("[TABLE]", 
tableOrViewName)).collect
{code}

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE'

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Jork Zijlstra (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784796#comment-15784796
 ] 

Jork Zijlstra commented on SPARK-19012:
---

Good to see that its already being discussed. 

MSSQL also has some limitation in tableOrViewNames which is described in the 
documentation. Maybe updating the annotation of the method would also be 
enough. Having an Exception with a clear reason would definitely already a fix.

[~hvanhovell]
We specify our queries inside a configuration not the code. So we have this in 
our config:
dataPath = "hdfs://"
dataQuery: "SELECT column1, column2 FROM \[TABLE] WHERE 1 = 1"

Since we have one SparkSession for the application and the tableOrViewName is 
coupled to that and we don't want to specify an extra config option for the 
tableOrViewname, I though I'd just use the hashcode from the dataquery as the 
tableOrViewName. Use that in the createOrReplaceTempView and replace \[TABLE] 
inside the query with that. 

{code}
val path = "hdfs://{path}"
val dataQuery = "SELECT * FROM [TABLE] LIMIT 1"

val tableOrViewName = "_" + Math.abs(path.hashCode).toString + 
Math.abs(qry.hashCode).toString

val df = sparkSession.read.orc(path)
df.createOrReplaceTempView(tableOrViewName)

val result = sparkSession.sqlContext.sql(qry.replace("[TABLE]", 
tableOrViewName)).collect
{code}

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19021:


Assignee: (was: Apache Spark)

> Generailize HDFSCredentialProvider to support non HDFS security FS
> --
>
> Key: SPARK-19021
> URL: https://issues.apache.org/jira/browse/SPARK-19021
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently Spark can only get token renewal interval from security HDFS 
> (hdfs://), if Spark runs with other security file systems like webHDFS 
> (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get 
> token renewal intervals from these tokens. These will make Spark unable to 
> work with these security clusters. So instead of only checking HDFS token, we 
> should generalize to support different {{DelegationTokenIdentifier}}.
> This is a follow-up work of SPARK-18840.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19021:


Assignee: Apache Spark

> Generailize HDFSCredentialProvider to support non HDFS security FS
> --
>
> Key: SPARK-19021
> URL: https://issues.apache.org/jira/browse/SPARK-19021
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Saisai Shao
>Assignee: Apache Spark
>Priority: Minor
>
> Currently Spark can only get token renewal interval from security HDFS 
> (hdfs://), if Spark runs with other security file systems like webHDFS 
> (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get 
> token renewal intervals from these tokens. These will make Spark unable to 
> work with these security clusters. So instead of only checking HDFS token, we 
> should generalize to support different {{DelegationTokenIdentifier}}.
> This is a follow-up work of SPARK-18840.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784772#comment-15784772
 ] 

Apache Spark commented on SPARK-19021:
--

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/16432

> Generailize HDFSCredentialProvider to support non HDFS security FS
> --
>
> Key: SPARK-19021
> URL: https://issues.apache.org/jira/browse/SPARK-19021
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently Spark can only get token renewal interval from security HDFS 
> (hdfs://), if Spark runs with other security file systems like webHDFS 
> (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get 
> token renewal intervals from these tokens. These will make Spark unable to 
> work with these security clusters. So instead of only checking HDFS token, we 
> should generalize to support different {{DelegationTokenIdentifier}}.
> This is a follow-up work of SPARK-18840.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS

2016-12-28 Thread Saisai Shao (JIRA)

Saisai Shao created SPARK-19021:
---

 Summary: Generailize HDFSCredentialProvider to support non HDFS 
security FS
 Key: SPARK-19021
 URL: https://issues.apache.org/jira/browse/SPARK-19021
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 2.1.0
Reporter: Saisai Shao


Currently Spark can only get token renewal interval from security HDFS 
(hdfs://), if Spark runs with other security file systems like webHDFS 
(webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get 
token renewal intervals from these tokens. These will make Spark unable to work 
with these security clusters. So instead of only checking HDFS token, we should 
generalize to support different {{DelegationTokenIdentifier}}.

This is a follow-up work of SPARK-18840.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19021) Generailize HDFSCredentialProvider to support non HDFS security FS

2016-12-28 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-19021:

Priority: Minor  (was: Major)

> Generailize HDFSCredentialProvider to support non HDFS security FS
> --
>
> Key: SPARK-19021
> URL: https://issues.apache.org/jira/browse/SPARK-19021
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.1.0
>Reporter: Saisai Shao
>Priority: Minor
>
> Currently Spark can only get token renewal interval from security HDFS 
> (hdfs://), if Spark runs with other security file systems like webHDFS 
> (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get 
> token renewal intervals from these tokens. These will make Spark unable to 
> work with these security clusters. So instead of only checking HDFS token, we 
> should generalize to support different {{DelegationTokenIdentifier}}.
> This is a follow-up work of SPARK-18840.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18930) Inserting in partitioned table - partitioned field should be last in select statement.

2016-12-28 Thread Song Jun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784678#comment-15784678
 ] 

Song Jun edited comment on SPARK-18930 at 12/29/16 6:44 AM:


from hive document, 
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert

Note that the dynamic partition values are selected by ordering, not name, and 
taken as the last columns from the select clause.

and test it on hive also have the same logic as your description .

I think we can close this jira? [~srowen]


was (Author: windpiger):
from hive document, 
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert

Note that the dynamic partition values are selected by ordering, not name, and 
taken as the last columns from the select clause.

and test it on hive also have the same logic as your description .

I think we can close this jira?

> Inserting in partitioned table - partitioned field should be last in select 
> statement. 
> ---
>
> Key: SPARK-18930
> URL: https://issues.apache.org/jira/browse/SPARK-18930
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Egor Pahomov
>
> CREATE TABLE temp.test_partitioning_4 (
>   num string
>  ) 
> PARTITIONED BY (
>   day string)
>   stored as parquet
> INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day)
> select day, count(*) as num from 
> hss.session where year=2016 and month=4 
> group by day
> Resulted schema on HDFS: /temp.db/test_partitioning_3/day=62456298, 
> emp.db/test_partitioning_3/day=69094345
> As you can imagine these numbers are num of records. But! When I do select * 
> from  temp.test_partitioning_4 data is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18930) Inserting in partitioned table - partitioned field should be last in select statement.

2016-12-28 Thread Song Jun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784678#comment-15784678
 ] 

Song Jun commented on SPARK-18930:
--

from hive document, 
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert

Note that the dynamic partition values are selected by ordering, not name, and 
taken as the last columns from the select clause.

and test it on hive also have the same logic as your description .

I think we can close this jira?

> Inserting in partitioned table - partitioned field should be last in select 
> statement. 
> ---
>
> Key: SPARK-18930
> URL: https://issues.apache.org/jira/browse/SPARK-18930
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Egor Pahomov
>
> CREATE TABLE temp.test_partitioning_4 (
>   num string
>  ) 
> PARTITIONED BY (
>   day string)
>   stored as parquet
> INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day)
> select day, count(*) as num from 
> hss.session where year=2016 and month=4 
> group by day
> Resulted schema on HDFS: /temp.db/test_partitioning_3/day=62456298, 
> emp.db/test_partitioning_3/day=69094345
> As you can imagine these numbers are num of records. But! When I do select * 
> from  temp.test_partitioning_4 data is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19020) Cardinality estimation of aggregate operator

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19020:


Assignee: Apache Spark

> Cardinality estimation of aggregate operator
> 
>
> Key: SPARK-19020
> URL: https://issues.apache.org/jira/browse/SPARK-19020
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zhenhua Wang
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19020) Cardinality estimation of aggregate operator

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19020:


Assignee: (was: Apache Spark)

> Cardinality estimation of aggregate operator
> 
>
> Key: SPARK-19020
> URL: https://issues.apache.org/jira/browse/SPARK-19020
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zhenhua Wang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19020) Cardinality estimation of aggregate operator

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784674#comment-15784674
 ] 

Apache Spark commented on SPARK-19020:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/16431

> Cardinality estimation of aggregate operator
> 
>
> Key: SPARK-19020
> URL: https://issues.apache.org/jira/browse/SPARK-19020
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zhenhua Wang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18567) Simplify CreateDataSourceTableAsSelectCommand

2016-12-28 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18567.
--
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 15996
[https://github.com/apache/spark/pull/15996]

> Simplify CreateDataSourceTableAsSelectCommand
> -
>
> Key: SPARK-18567
> URL: https://issues.apache.org/jira/browse/SPARK-18567
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results

2016-12-28 Thread Liang-Chi Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-16849:

Comment: was deleted

(was: Design doc v1)

> Improve subquery execution by deduplicating the subqueries with the same 
> results
> 
>
> Key: SPARK-16849
> URL: https://issues.apache.org/jira/browse/SPARK-16849
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
> Attachments: de-duplicating subqueries.pdf
>
>
> The subqueries in SparkSQL will be run even they have the same physical plan 
> and output same results. We should be able to deduplicate these subqueries 
> which are referred in a query for many times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results

2016-12-28 Thread Liang-Chi Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-16849:

Attachment: de-duplicating subqueries.pdf

> Improve subquery execution by deduplicating the subqueries with the same 
> results
> 
>
> Key: SPARK-16849
> URL: https://issues.apache.org/jira/browse/SPARK-16849
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
> Attachments: de-duplicating subqueries.pdf
>
>
> The subqueries in SparkSQL will be run even they have the same physical plan 
> and output same results. We should be able to deduplicate these subqueries 
> which are referred in a query for many times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results

2016-12-28 Thread Liang-Chi Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-16849:

Attachment: (was: de-duplicating subqueries.pdf)

> Improve subquery execution by deduplicating the subqueries with the same 
> results
> 
>
> Key: SPARK-16849
> URL: https://issues.apache.org/jira/browse/SPARK-16849
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
> Attachments: de-duplicating subqueries.pdf
>
>
> The subqueries in SparkSQL will be run even they have the same physical plan 
> and output same results. We should be able to deduplicate these subqueries 
> which are referred in a query for many times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19020) Cardinality estimation of aggregate operator

2016-12-28 Thread Zhenhua Wang (JIRA)

Zhenhua Wang created SPARK-19020:


 Summary: Cardinality estimation of aggregate operator
 Key: SPARK-19020
 URL: https://issues.apache.org/jira/browse/SPARK-19020
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Zhenhua Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17077) Cardinality estimation of project operator

2016-12-28 Thread Zhenhua Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-17077:
-
Summary: Cardinality estimation of project operator  (was: Cardinality 
estimation for project operator)

> Cardinality estimation of project operator
> --
>
> Key: SPARK-17077
> URL: https://issues.apache.org/jira/browse/SPARK-17077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17077) Cardinality estimation for project operator

2016-12-28 Thread Zhenhua Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-17077:
-
Summary: Cardinality estimation for project operator  (was: Cardinality 
estimation project operator)

> Cardinality estimation for project operator
> ---
>
> Key: SPARK-17077
> URL: https://issues.apache.org/jira/browse/SPARK-17077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17077) Cardinality estimation project operator

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17077:


Assignee: Apache Spark

> Cardinality estimation project operator
> ---
>
> Key: SPARK-17077
> URL: https://issues.apache.org/jira/browse/SPARK-17077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17077) Cardinality estimation project operator

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17077:


Assignee: (was: Apache Spark)

> Cardinality estimation project operator
> ---
>
> Key: SPARK-17077
> URL: https://issues.apache.org/jira/browse/SPARK-17077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17077) Cardinality estimation project operator

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784384#comment-15784384
 ] 

Apache Spark commented on SPARK-17077:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/16430

> Cardinality estimation project operator
> ---
>
> Key: SPARK-17077
> URL: https://issues.apache.org/jira/browse/SPARK-17077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17077) Cardinality estimation project operator

2016-12-28 Thread Zhenhua Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-17077:
-
Summary: Cardinality estimation project operator  (was: Cardinality 
estimation of group-by, project, union, etc.)

> Cardinality estimation project operator
> ---
>
> Key: SPARK-17077
> URL: https://issues.apache.org/jira/browse/SPARK-17077
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16213) Reduce runtime overhead of a program that creates an primitive array in DataFrame

2016-12-28 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-16213.
-
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.2.0

> Reduce runtime overhead of a program that creates an primitive array in 
> DataFrame
> -
>
> Key: SPARK-16213
> URL: https://issues.apache.org/jira/browse/SPARK-16213
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0
>
>
> Reduce runtime overhead of a program that creates an primitive array in 
> DataFrame
> When a program creates an array in DataFrame, the code generator creates 
> boxing operations. If an array is for primitive type, there are some 
> opportunities for optimizations in generated code to reduce runtime overhead.
> Here is a simple example that has generated code with boxing operation
> {code}
> val df = sparkContext.parallelize(Seq(0.0d, 1.0d), 1).toDF
> df.selectExpr("Array(value + 1.1d, value + 2.2d)").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19019) PySpark does not work with Python 3.6.0

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19019:


Assignee: Apache Spark

> PySpark does not work with Python 3.6.0
> ---
>
> Key: SPARK-19019
> URL: https://issues.apache.org/jira/browse/SPARK-19019
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Critical
>
> Currently, PySpark does not work with Python 3.6.0.
> Running {{./bin/pyspark}} simply throws the error as below:
> {code}
> Traceback (most recent call last):
>   File ".../spark/python/pyspark/shell.py", line 30, in 
> import pyspark
>   File ".../spark/python/pyspark/__init__.py", line 46, in 
> from pyspark.context import SparkContext
>   File ".../spark/python/pyspark/context.py", line 36, in 
> from pyspark.java_gateway import launch_gateway
>   File ".../spark/python/pyspark/java_gateway.py", line 31, in 
> from py4j.java_gateway import java_import, JavaGateway, GatewayClient
>   File "", line 961, in _find_and_load
>   File "", line 950, in _find_and_load_unlocked
>   File "", line 646, in _load_unlocked
>   File "", line 616, in _load_backward_compatible
>   File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 
> 18, in 
>   File 
> "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py",
>  line 62, in 
> import pkgutil
>   File 
> "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py",
>  line 22, in 
> ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
>   File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
> cls = _old_namedtuple(*args, **kwargs)
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
> 'rename', and 'module'
> {code}
> The problem is in 
> https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394
>  as the error says and the cause seems because the arguments of 
> {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 
> (See https://bugs.python.org/issue25628).
> We currently copy this function via {{types.FunctionType}} which does not set 
> the default values of keyword-only arguments (meaning 
> {{namedtuple.__kwdefaults__}}) and this seems causing internally missing 
> values in the function (non-bound arguments).
> This ends up as below:
> {code}
> import types
> import collections
> def _copy_func(f):
> return types.FunctionType(f.__code__, f.__globals__, f.__name__,
> f.__defaults__, f.__closure__)
> _old_namedtuple = _copy_func(collections.namedtuple)
> _old_namedtuple(, "b")
> _old_namedtuple("a")
> {code}
> If we call as below:
> {code}
> >>> _old_namedtuple("a", "b")
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
> 'rename', and 'module'
> {code}
> It throws an exception as above becuase {{__kwdefaults__}} for required 
> keyword arguments seem unset in the copied function. So, if we give explicit 
> value for these,
> {code}
> >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None)
> 
> {code}
> It works fine.
> It seems now we should properly set these into the hijected one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19019) PySpark does not work with Python 3.6.0

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19019:


Assignee: (was: Apache Spark)

> PySpark does not work with Python 3.6.0
> ---
>
> Key: SPARK-19019
> URL: https://issues.apache.org/jira/browse/SPARK-19019
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Hyukjin Kwon
>Priority: Critical
>
> Currently, PySpark does not work with Python 3.6.0.
> Running {{./bin/pyspark}} simply throws the error as below:
> {code}
> Traceback (most recent call last):
>   File ".../spark/python/pyspark/shell.py", line 30, in 
> import pyspark
>   File ".../spark/python/pyspark/__init__.py", line 46, in 
> from pyspark.context import SparkContext
>   File ".../spark/python/pyspark/context.py", line 36, in 
> from pyspark.java_gateway import launch_gateway
>   File ".../spark/python/pyspark/java_gateway.py", line 31, in 
> from py4j.java_gateway import java_import, JavaGateway, GatewayClient
>   File "", line 961, in _find_and_load
>   File "", line 950, in _find_and_load_unlocked
>   File "", line 646, in _load_unlocked
>   File "", line 616, in _load_backward_compatible
>   File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 
> 18, in 
>   File 
> "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py",
>  line 62, in 
> import pkgutil
>   File 
> "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py",
>  line 22, in 
> ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
>   File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
> cls = _old_namedtuple(*args, **kwargs)
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
> 'rename', and 'module'
> {code}
> The problem is in 
> https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394
>  as the error says and the cause seems because the arguments of 
> {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 
> (See https://bugs.python.org/issue25628).
> We currently copy this function via {{types.FunctionType}} which does not set 
> the default values of keyword-only arguments (meaning 
> {{namedtuple.__kwdefaults__}}) and this seems causing internally missing 
> values in the function (non-bound arguments).
> This ends up as below:
> {code}
> import types
> import collections
> def _copy_func(f):
> return types.FunctionType(f.__code__, f.__globals__, f.__name__,
> f.__defaults__, f.__closure__)
> _old_namedtuple = _copy_func(collections.namedtuple)
> _old_namedtuple(, "b")
> _old_namedtuple("a")
> {code}
> If we call as below:
> {code}
> >>> _old_namedtuple("a", "b")
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
> 'rename', and 'module'
> {code}
> It throws an exception as above becuase {{__kwdefaults__}} for required 
> keyword arguments seem unset in the copied function. So, if we give explicit 
> value for these,
> {code}
> >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None)
> 
> {code}
> It works fine.
> It seems now we should properly set these into the hijected one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19019) PySpark does not work with Python 3.6.0

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784329#comment-15784329
 ] 

Apache Spark commented on SPARK-19019:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/16429

> PySpark does not work with Python 3.6.0
> ---
>
> Key: SPARK-19019
> URL: https://issues.apache.org/jira/browse/SPARK-19019
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Hyukjin Kwon
>Priority: Critical
>
> Currently, PySpark does not work with Python 3.6.0.
> Running {{./bin/pyspark}} simply throws the error as below:
> {code}
> Traceback (most recent call last):
>   File ".../spark/python/pyspark/shell.py", line 30, in 
> import pyspark
>   File ".../spark/python/pyspark/__init__.py", line 46, in 
> from pyspark.context import SparkContext
>   File ".../spark/python/pyspark/context.py", line 36, in 
> from pyspark.java_gateway import launch_gateway
>   File ".../spark/python/pyspark/java_gateway.py", line 31, in 
> from py4j.java_gateway import java_import, JavaGateway, GatewayClient
>   File "", line 961, in _find_and_load
>   File "", line 950, in _find_and_load_unlocked
>   File "", line 646, in _load_unlocked
>   File "", line 616, in _load_backward_compatible
>   File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 
> 18, in 
>   File 
> "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py",
>  line 62, in 
> import pkgutil
>   File 
> "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py",
>  line 22, in 
> ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
>   File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
> cls = _old_namedtuple(*args, **kwargs)
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
> 'rename', and 'module'
> {code}
> The problem is in 
> https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394
>  as the error says and the cause seems because the arguments of 
> {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 
> (See https://bugs.python.org/issue25628).
> We currently copy this function via {{types.FunctionType}} which does not set 
> the default values of keyword-only arguments (meaning 
> {{namedtuple.__kwdefaults__}}) and this seems causing internally missing 
> values in the function (non-bound arguments).
> This ends up as below:
> {code}
> import types
> import collections
> def _copy_func(f):
> return types.FunctionType(f.__code__, f.__globals__, f.__name__,
> f.__defaults__, f.__closure__)
> _old_namedtuple = _copy_func(collections.namedtuple)
> _old_namedtuple(, "b")
> _old_namedtuple("a")
> {code}
> If we call as below:
> {code}
> >>> _old_namedtuple("a", "b")
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
> 'rename', and 'module'
> {code}
> It throws an exception as above becuase {{__kwdefaults__}} for required 
> keyword arguments seem unset in the copied function. So, if we give explicit 
> value for these,
> {code}
> >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None)
> 
> {code}
> It works fine.
> It seems now we should properly set these into the hijected one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19019) PySpark does not work with Python 3.6.0

2016-12-28 Thread Hyukjin Kwon (JIRA)

Hyukjin Kwon created SPARK-19019:


 Summary: PySpark does not work with Python 3.6.0
 Key: SPARK-19019
 URL: https://issues.apache.org/jira/browse/SPARK-19019
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Reporter: Hyukjin Kwon
Priority: Critical


Currently, PySpark does not work with Python 3.6.0.

Running {{./bin/pyspark}} simply throws the error as below:

{code}
Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in 
import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in 
from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in 
from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in 
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "", line 961, in _find_and_load
  File "", line 950, in _find_and_load_unlocked
  File "", line 646, in _load_unlocked
  File "", line 616, in _load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 
18, in 
  File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py",
 line 62, in 
import pkgutil
  File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py",
 line 22, in 
ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
'rename', and 'module'
{code}

The problem is in 
https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394
 as the error says and the cause seems because the arguments of {{namedtuple}} 
are now completely keyword-only arguments from Python 3.6.0 (See 
https://bugs.python.org/issue25628).

We currently copy this function via {{types.FunctionType}} which does not set 
the default values of keyword-only arguments (meaning 
{{namedtuple.__kwdefaults__}}) and this seems causing internally missing values 
in the function (non-bound arguments).


This ends up as below:

{code}
import types
import collections

def _copy_func(f):
return types.FunctionType(f.__code__, f.__globals__, f.__name__,
f.__defaults__, f.__closure__)

_old_namedtuple = _copy_func(collections.namedtuple)

_old_namedtuple(, "b")
_old_namedtuple("a")
{code}


If we call as below:

{code}
>>> _old_namedtuple("a", "b")
Traceback (most recent call last):
  File "", line 1, in 
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
'rename', and 'module'
{code}


It throws an exception as above becuase {{__kwdefaults__}} for required keyword 
arguments seem unset in the copied function. So, if we give explicit value for 
these,

{code}
>>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None)

{code}

It works fine.

It seems now we should properly set these into the hijected one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19018) spark csv writer charset support

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19018:


Assignee: (was: Apache Spark)

> spark csv writer charset support
> 
>
> Key: SPARK-19018
> URL: https://issues.apache.org/jira/browse/SPARK-19018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: todd.chen
>
> if we write dataFrame to csv ,default charset is utf-8,and we can't change it 
> like we read csv by giving `encoding ` in params ,so I think we should 
> support csv write by pass a param



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19018) spark csv writer charset support

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19018:


Assignee: Apache Spark

> spark csv writer charset support
> 
>
> Key: SPARK-19018
> URL: https://issues.apache.org/jira/browse/SPARK-19018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: todd.chen
>Assignee: Apache Spark
>
> if we write dataFrame to csv ,default charset is utf-8,and we can't change it 
> like we read csv by giving `encoding ` in params ,so I think we should 
> support csv write by pass a param



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19018) spark csv writer charset support

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784278#comment-15784278
 ] 

Apache Spark commented on SPARK-19018:
--

User 'cjuexuan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16428

> spark csv writer charset support
> 
>
> Key: SPARK-19018
> URL: https://issues.apache.org/jira/browse/SPARK-19018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: todd.chen
>
> if we write dataFrame to csv ,default charset is utf-8,and we can't change it 
> like we read csv by giving `encoding ` in params ,so I think we should 
> support csv write by pass a param



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19018) spark csv writer charset support

2016-12-28 Thread todd.chen (JIRA)

todd.chen created SPARK-19018:
-

 Summary: spark csv writer charset support
 Key: SPARK-19018
 URL: https://issues.apache.org/jira/browse/SPARK-19018
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: todd.chen


if we write dataFrame to csv ,default charset is utf-8,and we can't change it 
like we read csv by giving `encoding ` in params ,so I think we should support 
csv write by pass a param



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19007) Speedup and optimize the GradientBoostedTrees in the "data>memory" scene

2016-12-28 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-19007:
--
Component/s: (was: MLlib)

> Speedup and optimize the GradientBoostedTrees in the "data>memory" scene
> 
>
> Key: SPARK-19007
> URL: https://issues.apache.org/jira/browse/SPARK-19007
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
> Environment: A CDH cluster consists of 3 redhat server ,(120G 
> memory、40 cores、43TB disk per server).
>Reporter: zhangdenghui
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Test data：80G CTR training data from 
> criteolabs(http://criteolabs.wpengine.com/downloads/download-terabyte-click-logs/
>  ) ,I used 1 of the 24 days' data.Some  features needed to be repalced by new 
> generated continuous features，the way to generate the new features refers to 
> the way mentioned in the xgboost's paper.
> Recource allocated: spark on yarn, 20 executors, 8G memory and 2 cores per 
> executor.
> Parameters: numIterations 10, maxdepth  8,   the rest parameters are default
> I tested the GradientBoostedTrees algorithm in mllib  using 80G CTR data 
> mentioned above.
> It totally costs 1.5 hour, and i found many task failures after 6 or 7 GBT 
> rounds later.Without these task failures and task retry it can be much faster 
> ,which can save about half the time. I think it's caused by the RDD named 
> predError in the while loop of  the boost method at 
> GradientBoostedTrees.scala,because the lineage of the RDD named predError is 
> growing after every GBT round, and then it caused failures like this :
> (ExecutorLostFailure (executor 6 exited caused by one of the running tasks) 
> Reason: Container killed by YARN for exceeding memory limits. 10.2 GB of 10 
> GB physical memory used. Consider boosting 
> spark.yarn.executor.memoryOverhead.).  
> I tried to boosting spark.yarn.executor.memoryOverhead  but the meomry it 
> needed is too much (even increase half the memory  can't solve the problem) 
> so i think it's not a proper method. 
> Although it can set the predCheckpoint  Interval  smaller  to cut the line of 
> the lineage  but it increases IO cost a lot. 
> I tried  another way to solve this problem.I persisted the RDD named 
> predError every round  and use  pre_predError to record the previous RDD  and 
> unpersist it  because it's useless anymore.
> Finally it costs about 45 min after i tried my method and no task failure 
> occured and no more memeory added. 
> So when the data is much larger than memory, my little improvement can 
> speedup  the  GradientBoostedTrees  1~2 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18948) Add Mean Percentile Rank metric for ranking algorithms

2016-12-28 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784216#comment-15784216
 ] 

Joseph K. Bradley commented on SPARK-18948:
---

Thanks [~danilo.ascione] for suggesting this.  A few initial comments:
* We are not accepting new features in the RDD-based API (spark.mllib), but 
only in the DataFrame-based API (spark.ml).  If you'd like to get this in, it 
will need to be rewritten for the DataFrame-based API.
* (minor) Please don't set the Shepherd field; committers will use it to track 
releases.

I haven't been able to check out the PR yet, but if broader discussions are 
being brought up there, then let's discuss the issues on this JIRA first before 
further implementation work.

Thanks!

> Add Mean Percentile Rank metric for ranking algorithms
> --
>
> Key: SPARK-18948
> URL: https://issues.apache.org/jira/browse/SPARK-18948
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Danilo Ascione
>
> Add the Mean Percentile Rank (MPR) metric for ranking algorithms, as 
> described in the paper :
> Hu, Y., Y. Koren, and C. Volinsky. “Collaborative Filtering for Implicit 
> Feedback Datasets.” In 2008 Eighth IEEE International Conference on Data 
> Mining, 263–72, 2008. doi:10.1109/ICDM.2008.22. 
> (http://yifanhu.net/PUB/cf.pdf) (NB: MPR is called "Expected percentile rank" 
> in the paper)
> The ALS algorithm for implicit feedback in Spark ML is based on the same 
> paper. 
> Spark ML lacks an implementation of an appropriate metric for implicit 
> feedback, so the MPR metric can fulfill this use case.
> This implementation add the metric to the RankingMetrics class under 
> org.apache.spark.mllib.evaluation (SPARK-3568), and it uses the same input 
> (prediction and label pairs).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18948) Add Mean Percentile Rank metric for ranking algorithms

2016-12-28 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18948:
--
Shepherd:   (was: Xiangrui Meng)

> Add Mean Percentile Rank metric for ranking algorithms
> --
>
> Key: SPARK-18948
> URL: https://issues.apache.org/jira/browse/SPARK-18948
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Danilo Ascione
>
> Add the Mean Percentile Rank (MPR) metric for ranking algorithms, as 
> described in the paper :
> Hu, Y., Y. Koren, and C. Volinsky. “Collaborative Filtering for Implicit 
> Feedback Datasets.” In 2008 Eighth IEEE International Conference on Data 
> Mining, 263–72, 2008. doi:10.1109/ICDM.2008.22. 
> (http://yifanhu.net/PUB/cf.pdf) (NB: MPR is called "Expected percentile rank" 
> in the paper)
> The ALS algorithm for implicit feedback in Spark ML is based on the same 
> paper. 
> Spark ML lacks an implementation of an appropriate metric for implicit 
> feedback, so the MPR metric can fulfill this use case.
> This implementation add the metric to the RankingMetrics class under 
> org.apache.spark.mllib.evaluation (SPARK-3568), and it uses the same input 
> (prediction and label pairs).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18929) Add Tweedie distribution in GLM

2016-12-28 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18929:
--
Affects Version/s: (was: 2.0.2)

> Add Tweedie distribution in GLM
> ---
>
> Key: SPARK-18929
> URL: https://issues.apache.org/jira/browse/SPARK-18929
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Wayne Zhang
>  Labels: features
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I propose to add the full Tweedie family into the GeneralizedLinearRegression 
> model. The Tweedie family is characterized by a power variance function. 
> Currently supported distributions such as Gaussian,  Poisson and Gamma 
> families are a special case of the 
> [Tweedie|https://en.wikipedia.org/wiki/Tweedie_distribution]. 
> I propose to add support for the other distributions:
> * compound Poisson: 1 < variancePower < 2. This one is widely used to model 
> zero-inflated continuous distributions. 
> * positive stable: variancePower > 2 and variancePower != 3. Used to model 
> extreme values.
> * inverse Gaussian: variancePower = 3.
>  The Tweedie family is supported in most statistical packages such as R 
> (statmod), SAS, h2o etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18862) Split SparkR mllib.R into multiple files

2016-12-28 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784190#comment-15784190
 ] 

Joseph K. Bradley commented on SPARK-18862:
---

I like the chosen organization too!

> Split SparkR mllib.R into multiple files
> 
>
> Key: SPARK-18862
> URL: https://issues.apache.org/jira/browse/SPARK-18862
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>
> SparkR mllib.R is getting bigger as we add more ML wrappers, I'd like to 
> split it into multiple files to make us easy to maintain:
> * mllibClassification.R
> * mllibRegression.R
> * mllibClustering.R
> * mllibFeature.R
> or:
> * mllib/classification.R
> * mllib/regression.R
> * mllib/clustering.R
> * mllib/features.R
> For R convention, it's more prefer the first way. And I'm not sure whether R 
> supports the second organized way (will check later). Please let me know your 
> preference. I think the start of a new release cycle is a good opportunity to 
> do this, since it will involves less conflicts. If this proposal was 
> approved, I can work on it.
> cc [~felixcheung] [~josephkb] [~mengxr] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16552) Store the Inferred Schemas into External Catalog Tables when Creating Tables

2016-12-28 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784183#comment-15784183
 ] 

Xiao Li commented on SPARK-16552:
-

[~yhuai] Yeah, see the discussion in 
https://github.com/apache/spark/pull/15983#issuecomment-267836485. I think we 
need to document the behavior changes. 

> Store the Inferred Schemas into External Catalog Tables when Creating Tables
> 
>
> Key: SPARK-16552
> URL: https://issues.apache.org/jira/browse/SPARK-16552
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>  Labels: release_notes, releasenotes
> Fix For: 2.1.0
>
>
> Currently, in Spark SQL, the initial creation of schema can be classified 
> into two groups. It is applicable to both Hive tables and Data Source tables:
> Group A. Users specify the schema. 
> Case 1 CREATE TABLE AS SELECT: the schema is determined by the result schema 
> of the SELECT clause. For example,
> {noformat}
> CREATE TABLE tab STORED AS TEXTFILE
> AS SELECT * from input
> {noformat}
> Case 2 CREATE TABLE: users explicitly specify the schema. For example,
> {noformat}
> CREATE TABLE jsonTable (_1 string, _2 string)
> USING org.apache.spark.sql.json
> {noformat}
> Group B. Spark SQL infer the schema at runtime.
> Case 3 CREATE TABLE. Users do not specify the schema but the path to the file 
> location. For example,
> {noformat}
> CREATE TABLE jsonTable 
> USING org.apache.spark.sql.json
> OPTIONS (path '${tempDir.getCanonicalPath}')
> {noformat}
> Now, Spark SQL does not store the inferred schema in the external catalog for 
> the cases in Group B. When users refreshing the metadata cache, accessing the 
> table at the first time after (re-)starting Spark, Spark SQL will infer the 
> schema and store the info in the metadata cache for improving the performance 
> of subsequent metadata requests. However, the runtime schema inference could 
> cause undesirable schema changes after each reboot of Spark.
> It is desirable to store the inferred schema in the external catalog when 
> creating the table. When users intend to refresh the schema, they issue 
> `REFRESH TABLE`. Spark SQL will infer the schema again based on the 
> previously specified table location and update/refresh the schema in the 
> external catalog and metadata cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784174#comment-15784174
 ] 

Apache Spark commented on SPARK-19012:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/16427

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19012:


Assignee: Apache Spark

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>Assignee: Apache Spark
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19012:


Assignee: (was: Apache Spark)

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784169#comment-15784169
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

In API docs and many places, `createOrReplaceTempView` was assumed not to throw 
any Exceptions.
It seems we need to discuss on my PR.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784083#comment-15784083
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

Thank you for decision. Yep. I'll make the PR like that.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784074#comment-15784074
 ] 

Herman van Hovell edited comment on SPARK-19012 at 12/29/16 12:21 AM:
--

Yeah, you have a point there. I was wondering if we would hit an issue here.

The question what we want to support:
* SQL compatibility. This would be one of the more common use cases. In that 
case it really does not make sense to support an identifier like '1', 
because that would fail in SQL.
* As much flexibility as you want.

[~jzijlstra] could you explain how you are using this?
[~dongjoon] lets just make the exception better for now.


was (Author: hvanhovell):
Yeah, you have a point there. I was wondering if we would hit an issue here.

The question what we want to support:
* SQL compatibility. This would be one of the more common use cases. In that 
case it really does not make sense to support an identifier like '1', 
because that would fail in SQL.
* As much flexibility as you want.

[~jzijlstra] could you explain where you are using this.
[~dongjoon] lets just make the exception better for now.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784074#comment-15784074
 ] 

Herman van Hovell commented on SPARK-19012:
---

Yeah, you have a point there. I was wondering if we would hit an issue here.

The question what we want to support:
* SQL compatibility. This would be one of the more common use cases. In that 
case it really does not make sense to support an identifier like '1', 
because that would fail in SQL.
* As much flexibility as you want.

[~jzijlstra] could you explain where you are using this.
[~dongjoon] lets just make the exception better for now.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784062#comment-15784062
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

Ur, actually, we already support `createOrReplaceTempView("`1`")`.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784056#comment-15784056
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

BTW, [~hvanhovell]. I found the existing related issue and testcases.
{code}
test("SPARK-12982: Add table name validation in temp table registration") {
val df = Seq("foo", "bar").map(Tuple1.apply).toDF("col")
// invalid table name test as below
intercept[AnalysisException](df.createOrReplaceTempView("t~"))
// valid table name test as below
df.createOrReplaceTempView("table1")
// another invalid table name test as below
intercept[AnalysisException](df.createOrReplaceTempView("#$@sum"))
// another invalid table name test as below
intercept[AnalysisException](df.createOrReplaceTempView("table!#"))
  }
{code}

To be consistent with this, we should throw AnalysisException on 
`createOrReplaceTempView("1")`.

So, what we want here is to support `createOrReplaceTempView("`1`")`. Did I 
understand clearly?

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783990#comment-15783990
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

No problem. However, we need to raise AnalysisException on empty table table 
still, ``.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783992#comment-15783992
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

+1

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783988#comment-15783988
 ] 

Herman van Hovell commented on SPARK-19012:
---

Yeah, maybe a bit more subtle than that (we need to escape backticks in the 
name).

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783983#comment-15783983
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

Oh, you mean always wrap the name with backticks right?

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783966#comment-15783966
 ] 

Herman van Hovell edited comment on SPARK-19012 at 12/28/16 11:23 PM:
--

[~dongjoon] Could make a PR that puts the name in backticks instead? That is a 
bit more friendly to the end user. Or do you think we will break stuff, if we 
do?


was (Author: hvanhovell):
[~dongjoon] Could make a PR that puts the code in backticks instead? That is a 
bit more friendly to the end user. Or do you think we will break stuff, if we 
do?

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783966#comment-15783966
 ] 

Herman van Hovell commented on SPARK-19012:
---

[~dongjoon] Could make a PR that puts the code in backticks instead? That is a 
bit more friendly to the end user. Or do you think we will break stuff, if we 
do?

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-28 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783958#comment-15783958
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

Hi, [~hvanhovell] and [~jzijlstra].
I'll make a PR to raise `AnalysisException` instead.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

2016-12-28 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783937#comment-15783937
 ] 

Herman van Hovell commented on SPARK-19017:
---

[~nsyca] Why is this incorrect?

If I rewrite the NOT IN into a WHERE statement this would become:
{noformat}
select * from t1 where (a1 <> 1 AND b1 <> NULL)
{noformat}
There WHERE would evaluate to NULL, and it would never return a result.

> NOT IN subquery with more than one column may return incorrect results
> --
>
> Key: SPARK-19017
> URL: https://issues.apache.org/jira/browse/SPARK-19017
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17847) Reduce shuffled data size of GaussianMixture & copy the implementation from mllib to ml

2016-12-28 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-17847:
--
Target Version/s: 2.2.0

> Reduce shuffled data size of GaussianMixture & copy the implementation from 
> mllib to ml
> ---
>
> Key: SPARK-17847
> URL: https://issues.apache.org/jira/browse/SPARK-17847
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>
> Copy {{GaussianMixture}} implementation from mllib to ml, then we can add new 
> features to it.
> I left mllib {{GaussianMixture}} untouched, unlike some other algorithms to 
> wrap the ml implementation. For the following reasons:
> * mllib {{GaussianMixture}} allow k == 1, but ml does not.
> * mllib {{GaussianMixture}} supports setting initial model, but ml does not 
> support currently. (We will definitely add this feature for ml in the future)
> Meanwhile, There is a big performance improvement for {{GaussianMixture}} in 
> this task. Since the covariance matrix of multivariate gaussian distribution 
> is symmetric, we can only store the upper triangular part of the matrix and 
> it will greatly reduce the shuffled data size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-12-28 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783821#comment-15783821
 ] 

Devaraj K commented on SPARK-15359:
---

[~yu2003w], seems you are also facing the same issue which I mentioned in the 
description, I already created PR for this issue, do you have chance to try 
with the PR available and let me know your feedback?

> Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
> ---
>
> Key: SPARK-15359
> URL: https://issues.apache.org/jira/browse/SPARK-15359
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during 
> the successful registration but if the mesosDriver.run() returns 
> DRIVER_ABORTED status after the successful register then there is no action 
> for the status and the thread will be terminated. 
> I think we need to throw the exception and shutdown the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16552) Store the Inferred Schemas into External Catalog Tables when Creating Tables

2016-12-28 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783705#comment-15783705
 ] 

Yin Huai commented on SPARK-16552:
--

[~smilegator] [~cloud_fan] i think we will not do partitioning discovery after 
SPARK-17861 by default right? Can you help me check if we still need to write 
anything about this in the release notes?

> Store the Inferred Schemas into External Catalog Tables when Creating Tables
> 
>
> Key: SPARK-16552
> URL: https://issues.apache.org/jira/browse/SPARK-16552
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>  Labels: release_notes, releasenotes
> Fix For: 2.1.0
>
>
> Currently, in Spark SQL, the initial creation of schema can be classified 
> into two groups. It is applicable to both Hive tables and Data Source tables:
> Group A. Users specify the schema. 
> Case 1 CREATE TABLE AS SELECT: the schema is determined by the result schema 
> of the SELECT clause. For example,
> {noformat}
> CREATE TABLE tab STORED AS TEXTFILE
> AS SELECT * from input
> {noformat}
> Case 2 CREATE TABLE: users explicitly specify the schema. For example,
> {noformat}
> CREATE TABLE jsonTable (_1 string, _2 string)
> USING org.apache.spark.sql.json
> {noformat}
> Group B. Spark SQL infer the schema at runtime.
> Case 3 CREATE TABLE. Users do not specify the schema but the path to the file 
> location. For example,
> {noformat}
> CREATE TABLE jsonTable 
> USING org.apache.spark.sql.json
> OPTIONS (path '${tempDir.getCanonicalPath}')
> {noformat}
> Now, Spark SQL does not store the inferred schema in the external catalog for 
> the cases in Group B. When users refreshing the metadata cache, accessing the 
> table at the first time after (re-)starting Spark, Spark SQL will infer the 
> schema and store the info in the metadata cache for improving the performance 
> of subsequent metadata requests. However, the runtime schema inference could 
> cause undesirable schema changes after each reboot of Spark.
> It is desirable to store the inferred schema in the external catalog when 
> creating the table. When users intend to refresh the schema, they issue 
> `REFRESH TABLE`. Spark SQL will infer the schema again based on the 
> previously specified table location and update/refresh the schema in the 
> external catalog and metadata cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

2016-12-28 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783653#comment-15783653
 ] 

Nattavut Sutyanyong commented on SPARK-19017:
-

The semantics of the NOT IN for multiple columns T1(a1, b1, ... ) NOT IN T2(a2, 
b2, ...) is

# For any rows of T1 if a1 <> ALL (T2.a2), those rows are returned.
# For any rows of T1 if a1 = ANY (T2.a2), take the qualified rows from T1 and 
T2 and compare the values from the next pair of columns with the similar 
condition in 1. -- if b1 <> ALL (T2.b2), those rows are returned.
# Repeat the steps until the last pair in the column list.

> NOT IN subquery with more than one column may return incorrect results
> --
>
> Key: SPARK-19017
> URL: https://issues.apache.org/jira/browse/SPARK-19017
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18958) SparkR should support toJSON on DataFrame

2016-12-28 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-18958.
--
  Resolution: Fixed
Target Version/s: 2.2.0

> SparkR should support toJSON on DataFrame
> -
>
> Key: SPARK-18958
> URL: https://issues.apache.org/jira/browse/SPARK-18958
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Minor
>
> It makes it easier to interop with other component (esp. since R does not 
> have json support built in)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

2016-12-28 Thread Nattavut Sutyanyong (JIRA)

Nattavut Sutyanyong created SPARK-19017:
---

 Summary: NOT IN subquery with more than one column may return 
incorrect results
 Key: SPARK-19017
 URL: https://issues.apache.org/jira/browse/SPARK-19017
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0
Reporter: Nattavut Sutyanyong


When putting more than one column in the NOT IN, the query may not return 
correctly if there is a null data. We can demonstrate the problem with the 
following data set and query:

{code}
Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")

sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
+---+---+
| a1| b1|
+---+---+
+---+---+
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18669) Update Apache docs regard watermarking in Structured Streaming

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783633#comment-15783633
 ] 

Apache Spark commented on SPARK-18669:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/16425

> Update Apache docs regard watermarking in Structured Streaming
> --
>
> Key: SPARK-18669
> URL: https://issues.apache.org/jira/browse/SPARK-18669
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-28 Thread Josh Bacon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783543#comment-15783543
 ] 

Josh Bacon edited comment on SPARK-18737 at 12/28/16 7:39 PM:
--

Hi Sean, 
We've perform more tests and are experiencing the same issues with the 
following minimal code reproduction. (Spark 2.0.2 w/ prebuilt hadoop 2.7):

{code:title=Bar.scala|borderStyle=solid}
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kinesis.KinesisUtils

import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream

object StreamingFromKinesisTest {
  
  def main(args: Array[String]) {
val endpointUrl = "https://kinesis.us-west-2.amazonaws.com";;
val streamName = args(0);
val appName = args(1); //DynamoDB name
val region = "us-west-2";
val sparkSession = 
SparkSession.builder.appName("StreamingFromKinesisTest").getOrCreate();
val batchInterval = Seconds(10);
val streamingContext = new StreamingContext(sparkSession.sparkContext, 
batchInterval);
val kinesisStreams = (0 until 2).map { _ => 
  
KinesisUtils.createStream(streamingContext,appName,streamName,endpointUrl,region,InitialPositionInStream.TRIM_HORIZON,batchInterval,StorageLevel.MEMORY_AND_DISK_2);
};
val streamOfArrayBytes = streamingContext.union(kinesisStreams);
val streamStrings = streamOfArrayBytes.map(arrayBytes => new 
String(arrayBytes));
streamStrings.foreachRDD((rddString, timestamp) => {
  println(timestamp);
  if (!rddString.isEmpty()) {
println("Success!");
  }
});
streamingContext.start();
streamingContext.awaitTerminationOrTimeout(600)
  }
}
{code}
{panel:title=Executor Log 
Snippet|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
16/12/28 11:02:40 INFO BlockManager: Removing RDD 15
16/12/28 11:02:40 INFO BlockManager: Removing RDD 13
16/12/28 11:02:40 INFO BlockManager: Removing RDD 14
16/12/28 11:02:53 INFO CoarseGrainedExecutorBackend: Got assigned task 72
16/12/28 11:02:53 INFO Executor: Running task 0.0 in stage 4.0 (TID 72)
16/12/28 11:02:53 INFO TorrentBroadcast: Started reading broadcast variable 4
16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in 
memory (estimated size 1762.0 B, free 366.3 MB)
16/12/28 11:02:53 INFO TorrentBroadcast: Reading broadcast variable 4 took 10 ms
16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4 stored as values in 
memory (estimated size 2.6 KB, free 366.3 MB)
16/12/28 11:02:53 INFO TransportClientFactory: Successfully created connection 
to /172.21.50.111:5000 after 22 ms (21 ms spent in bootstraps)
16/12/28 11:02:54 INFO BlockManager: Found block input-1-1482951722353 remotely
16/12/28 11:02:54 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 72)
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
13994
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at 
org.apache.spark.SparkContext$$a

[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-28 Thread Josh Bacon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783543#comment-15783543
 ] 

Josh Bacon commented on SPARK-18737:


Hi Sean, 
We've perform a more tests and are experiencing the same issues with the 
following minimal code reproduction. (Spark 2.0.2 w/ prebuilt hadoop 2.7):

{code:title=Bar.scala|borderStyle=solid}
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kinesis.KinesisUtils

import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream

object StreamingFromKinesisTest {
  
  def main(args: Array[String]) {
val endpointUrl = "https://kinesis.us-west-2.amazonaws.com";;
val streamName = args(0);
val appName = args(1); //DynamoDB name
val region = "us-west-2";
val sparkSession = 
SparkSession.builder.appName("StreamingFromKinesisTest").getOrCreate();
val batchInterval = Seconds(10);
val streamingContext = new StreamingContext(sparkSession.sparkContext, 
batchInterval);
val kinesisStreams = (0 until 2).map { _ => 
  
KinesisUtils.createStream(streamingContext,appName,streamName,endpointUrl,region,InitialPositionInStream.TRIM_HORIZON,batchInterval,StorageLevel.MEMORY_AND_DISK_2);
};
val streamOfArrayBytes = streamingContext.union(kinesisStreams);
val streamStrings = streamOfArrayBytes.map(arrayBytes => new 
String(arrayBytes));
streamStrings.foreachRDD((rddString, timestamp) => {
  println(timestamp);
  if (!rddString.isEmpty()) {
println("Success!");
  }
});
streamingContext.start();
streamingContext.awaitTerminationOrTimeout(600)
  }
}
{code}
{panel:title=Executor Log 
Snippet|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
16/12/28 11:02:40 INFO BlockManager: Removing RDD 15
16/12/28 11:02:40 INFO BlockManager: Removing RDD 13
16/12/28 11:02:40 INFO BlockManager: Removing RDD 14
16/12/28 11:02:53 INFO CoarseGrainedExecutorBackend: Got assigned task 72
16/12/28 11:02:53 INFO Executor: Running task 0.0 in stage 4.0 (TID 72)
16/12/28 11:02:53 INFO TorrentBroadcast: Started reading broadcast variable 4
16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in 
memory (estimated size 1762.0 B, free 366.3 MB)
16/12/28 11:02:53 INFO TorrentBroadcast: Reading broadcast variable 4 took 10 ms
16/12/28 11:02:53 INFO MemoryStore: Block broadcast_4 stored as values in 
memory (estimated size 2.6 KB, free 366.3 MB)
16/12/28 11:02:53 INFO TransportClientFactory: Successfully created connection 
to /172.21.50.111:5000 after 22 ms (21 ms spent in bootstraps)
16/12/28 11:02:54 INFO BlockManager: Found block input-1-1482951722353 remotely
16/12/28 11:02:54 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 72)
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
13994
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1324)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)

[jira] [Commented] (SPARK-19016) Document scalable partition handling feature in the programming guide

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783538#comment-15783538
 ] 

Apache Spark commented on SPARK-19016:
--

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/16424

> Document scalable partition handling feature in the programming guide
> -
>
> Key: SPARK-19016
> URL: https://issues.apache.org/jira/browse/SPARK-19016
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Minor
>
> Currently, we only mention this in the migration guide. Should also document 
> it in the programming guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19016) Document scalable partition handling feature in the programming guide

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19016:


Assignee: Cheng Lian  (was: Apache Spark)

> Document scalable partition handling feature in the programming guide
> -
>
> Key: SPARK-19016
> URL: https://issues.apache.org/jira/browse/SPARK-19016
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Minor
>
> Currently, we only mention this in the migration guide. Should also document 
> it in the programming guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19016) Document scalable partition handling feature in the programming guide

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19016:


Assignee: Apache Spark  (was: Cheng Lian)

> Document scalable partition handling feature in the programming guide
> -
>
> Key: SPARK-19016
> URL: https://issues.apache.org/jira/browse/SPARK-19016
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, we only mention this in the migration guide. Should also document 
> it in the programming guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10878) Race condition when resolving Maven coordinates via Ivy

2016-12-28 Thread Andrew Snare (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783531#comment-15783531
 ] 

Andrew Snare commented on SPARK-10878:
--

I see this with Spark 2.0 as well.

There doesn't appear to be a good workaround, although I assume avoiding 
{{--packages}} means the Ivy cache isn't used and therefore the conflict can't 
occur.

> Race condition when resolving Maven coordinates via Ivy
> ---
>
> Key: SPARK-10878
> URL: https://issues.apache.org/jira/browse/SPARK-10878
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Ryan Williams
>Priority: Minor
>
> I've recently been shell-scripting the creation of many concurrent 
> Spark-on-YARN apps and observing a fraction of them to fail with what I'm 
> guessing is a race condition in their Maven-coordinate resolution.
> For example, I might spawn an app for each path in file {{paths}} with the 
> following shell script:
> {code}
> cat paths | parallel "$SPARK_HOME/bin/spark-submit foo.jar {}"
> {code}
> When doing this, I observe some fraction of the spawned jobs to fail with 
> errors like:
> {code}
> :: retrieving :: org.apache.spark#spark-submit-parent
> confs: [default]
> Exception in thread "main" java.lang.RuntimeException: problem during 
> retrieve of org.apache.spark#spark-submit-parent: java.text.ParseException: 
> failed to parse report: 
> /hpc/users/willir31/.ivy2/cache/org.apache.spark-spark-submit-parent-default.xml:
>  Premature end of file.
> at 
> org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:249)
> at 
> org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:83)
> at org.apache.ivy.Ivy.retrieve(Ivy.java:551)
> at 
> org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1006)
> at 
> org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.text.ParseException: failed to parse report: 
> /hpc/users/willir31/.ivy2/cache/org.apache.spark-spark-submit-parent-default.xml:
>  Premature end of file.
> at 
> org.apache.ivy.plugins.report.XmlReportParser.parse(XmlReportParser.java:293)
> at 
> org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:329)
> at 
> org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:118)
> ... 7 more
> Caused by: org.xml.sax.SAXParseException; Premature end of file.
> at 
> org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown 
> Source)
> at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown 
> Source)
> at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
> {code}
> The more apps I try to launch simultaneously, the greater fraction of them 
> seem to fail with this or similar errors; a batch of ~10 will usually work 
> fine, a batch of 15 will see a few failures, and a batch of ~60 will have 
> dozens of failures.
> [This gist shows 11 recent failures I 
> observed|https://gist.github.com/ryan-williams/648bff70e518de0c7c84].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19016) Document scalable partition handling feature in the programming guide

2016-12-28 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-19016:
--

 Summary: Document scalable partition handling feature in the 
programming guide
 Key: SPARK-19016
 URL: https://issues.apache.org/jira/browse/SPARK-19016
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.1.0, 2.2.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Minor


Currently, we only mention this in the migration guide. Should also document it 
in the programming guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18966) NOT IN subquery with correlated expressions may return incorrect result

2016-12-28 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783457#comment-15783457
 ] 

Nattavut Sutyanyong edited comment on SPARK-18966 at 12/28/16 6:40 PM:
---

Considering the following subquery:

{code}
select *
from   t1
where  a1 not in (select a2
  from   t2
  where  t2.b2 = t1.b1)
{code}

There are a number of scenarios to consider:

- 1. When the correlated predicate yields a match (i.e., T2.B2 = T1.B1)
   -- 1.1. When the NOT IN expression yields a match (i.e., T1.A1 = T2.A2)
   -- 1.2. When the NOT IN expression yields no match (i.e., T1.A1 = T2.A2 
returns false)
   -- 1.3. When T1.A1 is null
   -- 1.4. When T2.A2 is null
   --- 1.4.1. When T1.A1 is not null
   --- 1.4.2. When T1.A1 is null
- 2. When the correlated predicate yields no match (i.e., T2.B2 = T1.B1 is 
false or unknown)
   -- 2.1. When T2.B2 is null and T1.B1 is null
   -- 2.2. When T2.B2 is null and T1.B1 is not null
   -- 2.3. When the value of T1.B1 does not match any of T2.B2

{code}
T1.A1  T1.B1 T2.A2  T2.B2
-  - -  -
1  1 1  1(1.1)
2  1 (1.2)
 null  1 (1.3)

1  3  null  3(1.4.1)
 null  3 (1.4.2)

1   null 1   null(2.1)
 null  2 (2.2 & 2.3)
{code}

We can divide the evaluation of the above correlated NOT IN subquery into 2 
groups:-

Group 1: The rows in T1 when there is a match from the correlated predicate 
(T1.B1 = T2.B2)

In this case, the result of the subquery is not empty and the semantics of the 
NOT IN depends solely on the evaluation of the equality comparison of the 
columns of NOT IN, i.e., A1 = A2, which says

# If T1.A1 is null, the row is filtered (1.3 and 1.4.2)
# If T1.A1 = T2.A2, the row is filtered (1.1)
# If T2.A2 is null, any rows of T1 in the same group (T1.B1 = T2.B2) is filtered
   (1.4.1 & 1.4.2)
# Otherwise, the row is qualified.

Hence, in this group, the result is the row from (1.2).

Group 2: The rows in T1 when there is no match from the correlated predicate 
(T1.B1 = T2.B2)

In this case, all the rows in T1, including the rows where T1.A1, are qualified 
because the subquery returns an empty set and by the semantics of the NOT IN, 
all rows from the parent side qualifies as the result set, that is, the rows 
from (2.1, 2.2 and 2.3).

In conclusion, the correct result set of the above query is

{code}
T1.A1  T1.B1
-  -
2  1(1.2)
1   null(2.1)
 null  2(2.2 & 2.3)
{code}


was (Author: nsyca):
Considering the following subquery:

{code}
select *
from   t1
where  a1 not in (select a2
  from   t2
  where  t2.b2 = t1.b1)
{code}

There are a number of scenarios to consider:

- 1. When the correlated predicate yields a match (i.e., T2.B2 = T1.B1)
   -- 1.1. When the NOT IN expression yields a match
   (i.e., T1.A1 = T2.A2)
   -- 1.2. When the NOT IN expression yields no match
   (i.e., T1.A1 = T2.A2 returns false)
   -- 1.3. When T1.A1 is null
   -- 1.4. When T2.A2 is null
   --- 1.4.1. When T1.A1 is not null
   --- 1.4.2. When T1.A1 is null
- 2. When the correlated predicate yields no match (i.e., T2.B2 = T1.B1 is 
false or unknown)
   -- 2.1. When T2.B2 is null and T1.B1 is null
   -- 2.2. When T2.B2 is null and T1.B1 is not null
   -- 2.3. When the value of T1.B1 does not match any of T2.B2

{code}
T1.A1  T1.B1 T2.A2  T2.B2
-  - -  -
1  1 1  1(1.1)
2  1 (1.2)
 null  1 (1.3)

1  3  null  3(1.4.1)
 null  3 (1.4.2)

1   null 1   null(2.1)
 null  2 (2.2 & 2.3)
{code}

We can divide the evaluation of the above correlated NOT IN subquery into 2 
groups:-

Group 1: The rows in T1 when there is a match from the correlated predicate 
(T1.B1 = T2.B2)

In this case, the result of the subquery is not empty and the semantics of the 
NOT IN depends solely on the evaluation of the equality comparison of the 
columns of NOT IN, i.e., A1 = A2, which says

# If T1.A1 is null, the row is filtered (1.3 and 1.4.2)
# If T1.A1 = T2.A2, the row is filtered (1.1)
# If T2.A2 is null, any rows of T1 in the same group (T1.B1 = T2.B2) is filtered
   (1.4.1 & 1.4.2)
# Otherwise, the row is qualified.

Hence, in this group, the result is the row from (1.2).

Group 2: The rows in T1 when there is no match from the correlated predicate 
(T1.B1 = T2.B2)

In this case, all the rows in T1, including the rows where T1.A1, are qualified 
because the subquery returns an empty set and by the semantics of the NOT IN, 
all rows from the parent side qualifies as the result set, that is, the rows 
fr

[jira] [Commented] (SPARK-18966) NOT IN subquery with correlated expressions may return incorrect result

2016-12-28 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783457#comment-15783457
 ] 

Nattavut Sutyanyong commented on SPARK-18966:
-

Considering the following subquery:

{code}
select *
from   t1
where  a1 not in (select a2
  from   t2
  where  t2.b2 = t1.b1)
{code}

There are a number of scenarios to consider:

- 1. When the correlated predicate yields a match (i.e., T2.B2 = T1.B1)
   -- 1.1. When the NOT IN expression yields a match
   (i.e., T1.A1 = T2.A2)
   -- 1.2. When the NOT IN expression yields no match
   (i.e., T1.A1 = T2.A2 returns false)
   -- 1.3. When T1.A1 is null
   -- 1.4. When T2.A2 is null
   --- 1.4.1. When T1.A1 is not null
   --- 1.4.2. When T1.A1 is null
- 2. When the correlated predicate yields no match (i.e., T2.B2 = T1.B1 is 
false or unknown)
   -- 2.1. When T2.B2 is null and T1.B1 is null
   -- 2.2. When T2.B2 is null and T1.B1 is not null
   -- 2.3. When the value of T1.B1 does not match any of T2.B2

{code}
T1.A1  T1.B1 T2.A2  T2.B2
-  - -  -
1  1 1  1(1.1)
2  1 (1.2)
 null  1 (1.3)

1  3  null  3(1.4.1)
 null  3 (1.4.2)

1   null 1   null(2.1)
 null  2 (2.2 & 2.3)
{code}

We can divide the evaluation of the above correlated NOT IN subquery into 2 
groups:-

Group 1: The rows in T1 when there is a match from the correlated predicate 
(T1.B1 = T2.B2)

In this case, the result of the subquery is not empty and the semantics of the 
NOT IN depends solely on the evaluation of the equality comparison of the 
columns of NOT IN, i.e., A1 = A2, which says

# If T1.A1 is null, the row is filtered (1.3 and 1.4.2)
# If T1.A1 = T2.A2, the row is filtered (1.1)
# If T2.A2 is null, any rows of T1 in the same group (T1.B1 = T2.B2) is filtered
   (1.4.1 & 1.4.2)
# Otherwise, the row is qualified.

Hence, in this group, the result is the row from (1.2).

Group 2: The rows in T1 when there is no match from the correlated predicate 
(T1.B1 = T2.B2)

In this case, all the rows in T1, including the rows where T1.A1, are qualified 
because the subquery returns an empty set and by the semantics of the NOT IN, 
all rows from the parent side qualifies as the result set, that is, the rows 
from (2.1, 2.2 and 2.3).

In conclusion, the correct result set of the above query is

{code}
T1.A1  T1.B1
-  -
2  1(1.2)
1   null(2.1)
 null  2(2.2 & 2.3)
{code}

> NOT IN subquery with correlated expressions may return incorrect result
> ---
>
> Key: SPARK-18966
> URL: https://issues.apache.org/jira/browse/SPARK-18966
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>  Labels: correctness
>
> {code}
> Seq((1, 2)).toDF("a1", "b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer, java.lang.Integer)]((1, null)).toDF("a2", 
> "b2").createOrReplaceTempView("t2")
> // The expected result is 1 row of (1,2) as shown in the next statement.
> sql("select * from t1 where a1 not in (select a2 from t2 where b2 = b1)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> sql("select * from t1 where a1 not in (select a2 from t2 where b2 = 2)").show
> +---+---+
> | a1| b1|
> +---+---+
> |  1|  2|
> +---+---+
> {code}
> The two SQL statements above should return the same result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3246) Support weighted SVMWithSGD for classification of unbalanced dataset

2016-12-28 Thread Sheridan Rawlins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783435#comment-15783435
 ] 

Sheridan Rawlins commented on SPARK-3246:
-

Hey, I have a solution that just uses liblinear to do the work. Not sure if 
that would be acceptable to commit the added dependencies, but if it is, I also 
did the spark.ml port to gain all of the cross validation / hypertuning goodness

-SCR

Sent from my iPhone




> Support weighted SVMWithSGD for classification of unbalanced dataset
> 
>
> Key: SPARK-3246
> URL: https://issues.apache.org/jira/browse/SPARK-3246
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 0.9.0, 1.0.2
>Reporter: mahesh bhole
>
> Please support  weighted SVMWithSGD  for binary classification of unbalanced 
> dataset.Though other options like undersampling or oversampling can be 
> used,It will be good if we can have a way to assign weights to minority 
> class. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17999) Add getPreferredLocations for KafkaSourceRDD

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-17999:
-
Component/s: (was: DStreams)
 (was: SQL)
 Structured Streaming

> Add getPreferredLocations for KafkaSourceRDD
> 
>
> Key: SPARK-17999
> URL: https://issues.apache.org/jira/browse/SPARK-17999
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 2.0.2, 2.1.0
>
>
> The newly implemented Structured Streaming KafkaSource did calculate the 
> preferred locations for each topic partition, but didn't offer this 
> information through RDD's {{getPreferredLocations}} method. So here propose 
> to add this method in {{KafkaSourceRDD}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15698) Ability to remove old metadata for structure streaming MetadataLog

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-15698:
-
Component/s: (was: DStreams)
 (was: SQL)
 Structured Streaming

> Ability to remove old metadata for structure streaming MetadataLog
> --
>
> Key: SPARK-15698
> URL: https://issues.apache.org/jira/browse/SPARK-15698
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 2.0.1, 2.1.0
>
>
> Current MetadataLog lacks the ability to remove old checkpoint file, we'd 
> better add this functionality to the MetadataLog and honor it in the place 
> where MetadataLog is used, that will reduce unnecessary small files in the 
> long running scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16963) Change Source API so that sources do not need to keep unbounded state

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-16963:
-
Component/s: (was: DStreams)
 Structured Streaming

> Change Source API so that sources do not need to keep unbounded state
> -
>
> Key: SPARK-16963
> URL: https://issues.apache.org/jira/browse/SPARK-16963
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Frederick Reiss
>Assignee: Frederick Reiss
> Fix For: 2.0.2, 2.1.0
>
>
> The version of the Source API in Spark 2.0.0 defines a single getBatch() 
> method for fetching records from the source, with the following Scaladoc 
> comments defining the semantics:
> {noformat}
> /**
>  * Returns the data that is between the offsets (`start`, `end`]. When 
> `start` is `None` then
>  * the batch should begin with the first available record. This method must 
> always return the
>  * same data for a particular `start` and `end` pair.
>  */
> def getBatch(start: Option[Offset], end: Offset): DataFrame
> {noformat}
> These semantics mean that a Source must retain all past history for the 
> stream that it backs. Further, a Source is also required to retain this data 
> across restarts of the process where the Source is instantiated, even when 
> the Source is restarted on a different machine.
> These restrictions make it difficult to implement the Source API, as any 
> implementation requires potentially unbounded amounts of distributed storage.
> See the mailing list thread at 
> [http://apache-spark-developers-list.1001551.n3.nabble.com/Source-API-requires-unbounded-distributed-storage-td18551.html]
>  for more information.
> This JIRA will cover augmenting the Source API with an additional callback 
> that will allow Structured Streaming scheduler to notify the source when it 
> is safe to discard buffered data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17153) [Structured streams] readStream ignores partition columns

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-17153:
-
Component/s: (was: DStreams)
 Structured Streaming

> [Structured streams] readStream ignores partition columns
> -
>
> Key: SPARK-17153
> URL: https://issues.apache.org/jira/browse/SPARK-17153
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.0
>Reporter: Dmitri Carpov
>Assignee: Liang-Chi Hsieh
>  Labels: release_notes, releasenotes
> Fix For: 2.0.2, 2.1.0
>
>
> When parquet files are persisted using partitions, spark's `readStream` 
> returns data with all `null`s for the partitioned columns.
> For example:
> {noformat}
> case class A(id: Int, value: Int)
> val data = spark.createDataset(Seq(
>   A(1, 1), 
>   A(2, 2), 
>   A(2, 3))
> )
> val url = "/mnt/databricks/test"
> data.write.partitionBy("id").parquet(url)
> {noformat}
> when data is read as stream:
> {noformat}
> spark.readStream.schema(spark.read.load(url).schema).parquet(url)
> {noformat}
> it reads:
> {noformat}
> id, value
> null, 1
> null, 2
> null, 3
> {noformat}
> A possible reason is `readStream` reads parquet files directly but when those 
> are stored the columns they are partitioned by are excluded from the file 
> itself. In the given example the parquet files contain `value` information 
> only since `id` is partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17085) Documentation and actual code differs - Unsupported Operations

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-17085:
-
Component/s: (was: DStreams)
 Structured Streaming

> Documentation and actual code differs - Unsupported Operations
> --
>
> Key: SPARK-17085
> URL: https://issues.apache.org/jira/browse/SPARK-17085
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 2.0.0
>Reporter: Samritti
>Assignee: Jagadeesan A S
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> Spark Stuctured Streaming doc in this link
> https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations
> mentions 
> >>>"Right outer join with a streaming Dataset on the right is not supported"
>  but the code here conveys a different/opposite error
> https://github.com/apache/spark/blob/5545b791096756b07b3207fb3de13b68b9a37b00/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L114
> >>>"Right outer join with a streaming DataFrame/Dataset on the left is " +
> "not supported"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17475) HDFSMetadataLog should not leak CRC files

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-17475:
-
Component/s: (was: DStreams)
 Structured Streaming

> HDFSMetadataLog should not leak CRC files
> -
>
> Key: SPARK-17475
> URL: https://issues.apache.org/jira/browse/SPARK-17475
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.0.1
>Reporter: Frederick Reiss
>Assignee: Frederick Reiss
> Fix For: 2.1.0
>
>
> When HDFSMetadataLog uses a log directory on a filesystem other than HDFS 
> (i.e. NFS or the driver node's local filesystem), the class leaves orphan 
> checksum (CRC) files in the log directory. The files have names that follow 
> the pattern "..[long UUID hex string].tmp.crc". These files exist because 
> HDFSMetaDataLog renames other temporary files without renaming the 
> corresponding checksum files. There is one CRC file per batch, so the 
> directory fills up quite quickly.
> I'm not certain, but this problem might also occur on certain versions of the 
> HDFS APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17513) StreamExecution should discard unneeded metadata

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-17513:
-
Component/s: (was: DStreams)
 Structured Streaming

> StreamExecution should discard unneeded metadata
> 
>
> Key: SPARK-17513
> URL: https://issues.apache.org/jira/browse/SPARK-17513
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Frederick Reiss
>Assignee: Frederick Reiss
> Fix For: 2.0.1, 2.1.0
>
>
> The StreamExecution maintains a write-ahead log of batch metadata in order to 
> allow repeating previously in-flight batches if the driver is restarted. 
> StreamExecution does not garbage-collect or compact this log in any way.
> Since the log is implemented with HDFSMetadataLog, these files will consume 
> memory on the HDFS NameNode. Specifically, each log file will consume about 
> 300 bytes of NameNode memory (150 bytes for the inode and 150 bytes for the 
> block of file contents; see 
> [https://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html].
>  An application with a 100 msec batch interval will increase the NameNode's 
> heap usage by about 250MB per day.
> There is also the matter of recovery. StreamExecution reads its entire log 
> when restarting. This read operation will be very expensive if the log 
> contains millions of entries spread over millions of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18152) CLONE - FileStreamSource should not track the list of seen files indefinitely

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18152:
-
Component/s: (was: DStreams)
 (was: SQL)
 Structured Streaming

> CLONE - FileStreamSource should not track the list of seen files indefinitely
> -
>
> Key: SPARK-18152
> URL: https://issues.apache.org/jira/browse/SPARK-18152
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Sunil Kumar
>Assignee: Peter Lee
> Fix For: 2.0.1, 2.1.0
>
>
> FileStreamSource currently tracks all the files seen indefinitely, which 
> means it can run out of memory or overflow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18030) Flaky test: org.apache.spark.sql.streaming.FileStreamSourceSuite

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18030:
-
Component/s: (was: DStreams)
 Structured Streaming

> Flaky test: org.apache.spark.sql.streaming.FileStreamSourceSuite 
> -
>
> Key: SPARK-18030
> URL: https://issues.apache.org/jira/browse/SPARK-18030
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Reporter: Davies Liu
>Assignee: Shixiong Zhu
> Fix For: 2.0.2, 2.1.0
>
>
> https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.streaming.FileStreamSourceSuite&test_name=when+schema+inference+is+turned+on%2C+should+read+partition+data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18151) CLONE - MetadataLog should support purging old logs

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18151:
-
Component/s: (was: DStreams)
 (was: SQL)
 Structured Streaming

> CLONE - MetadataLog should support purging old logs
> ---
>
> Key: SPARK-18151
> URL: https://issues.apache.org/jira/browse/SPARK-18151
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Sunil Kumar
>Assignee: Peter Lee
> Fix For: 2.0.1, 2.1.0
>
>
> This is a useful primitive operation to have to support checkpointing and 
> forgetting old logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18153) CLONE - Ability to remove old metadata for structure streaming MetadataLog

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18153:
-
Component/s: (was: DStreams)
 (was: SQL)
 Structured Streaming

> CLONE - Ability to remove old metadata for structure streaming MetadataLog
> --
>
> Key: SPARK-18153
> URL: https://issues.apache.org/jira/browse/SPARK-18153
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Sunil Kumar
>Assignee: Saisai Shao
> Fix For: 2.0.1, 2.1.0
>
>
> Current MetadataLog lacks the ability to remove old checkpoint file, we'd 
> better add this functionality to the MetadataLog and honor it in the place 
> where MetadataLog is used, that will reduce unnecessary small files in the 
> long running scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18156) CLONE - StreamExecution should discard unneeded metadata

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18156:
-
Component/s: (was: DStreams)
 Structured Streaming

> CLONE - StreamExecution should discard unneeded metadata
> 
>
> Key: SPARK-18156
> URL: https://issues.apache.org/jira/browse/SPARK-18156
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Sunil Kumar
>Assignee: Frederick Reiss
> Fix For: 2.0.1, 2.1.0
>
>
> The StreamExecution maintains a write-ahead log of batch metadata in order to 
> allow repeating previously in-flight batches if the driver is restarted. 
> StreamExecution does not garbage-collect or compact this log in any way.
> Since the log is implemented with HDFSMetadataLog, these files will consume 
> memory on the HDFS NameNode. Specifically, each log file will consume about 
> 300 bytes of NameNode memory (150 bytes for the inode and 150 bytes for the 
> block of file contents; see 
> [https://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html].
>  An application with a 100 msec batch interval will increase the NameNode's 
> heap usage by about 250MB per day.
> There is also the matter of recovery. StreamExecution reads its entire log 
> when restarting. This read operation will be very expensive if the log 
> contains millions of entries spread over millions of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18154) CLONE - Change Source API so that sources do not need to keep unbounded state

2016-12-28 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18154:
-
Component/s: (was: DStreams)
 Structured Streaming

> CLONE - Change Source API so that sources do not need to keep unbounded state
> -
>
> Key: SPARK-18154
> URL: https://issues.apache.org/jira/browse/SPARK-18154
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Sunil Kumar
>Assignee: Frederick Reiss
> Fix For: 2.0.2, 2.1.0
>
>
> The version of the Source API in Spark 2.0.0 defines a single getBatch() 
> method for fetching records from the source, with the following Scaladoc 
> comments defining the semantics:
> {noformat}
> /**
>  * Returns the data that is between the offsets (`start`, `end`]. When 
> `start` is `None` then
>  * the batch should begin with the first available record. This method must 
> always return the
>  * same data for a particular `start` and `end` pair.
>  */
> def getBatch(start: Option[Offset], end: Offset): DataFrame
> {noformat}
> These semantics mean that a Source must retain all past history for the 
> stream that it backs. Further, a Source is also required to retain this data 
> across restarts of the process where the Source is instantiated, even when 
> the Source is restarted on a different machine.
> These restrictions make it difficult to implement the Source API, as any 
> implementation requires potentially unbounded amounts of distributed storage.
> See the mailing list thread at 
> [http://apache-spark-developers-list.1001551.n3.nabble.com/Source-API-requires-unbounded-distributed-storage-td18551.html]
>  for more information.
> This JIRA will cover augmenting the Source API with an additional callback 
> that will allow Structured Streaming scheduler to notify the source when it 
> is safe to discard buffered data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16849) Improve subquery execution by deduplicating the subqueries with the same results

2016-12-28 Thread Liang-Chi Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-16849:

Attachment: de-duplicating subqueries.pdf

Design doc v1

> Improve subquery execution by deduplicating the subqueries with the same 
> results
> 
>
> Key: SPARK-16849
> URL: https://issues.apache.org/jira/browse/SPARK-16849
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
> Attachments: de-duplicating subqueries.pdf
>
>
> The subqueries in SparkSQL will be run even they have the same physical plan 
> and output same results. We should be able to deduplicate these subqueries 
> which are referred in a query for many times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17772) Add helper testing methods for instance weighting

2016-12-28 Thread Yanbo Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang resolved SPARK-17772.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> Add helper testing methods for instance weighting
> -
>
> Key: SPARK-17772
> URL: https://issues.apache.org/jira/browse/SPARK-17772
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Reporter: Seth Hendrickson
>Assignee: Seth Hendrickson
>Priority: Minor
> Fix For: 2.2.0
>
>
> More and more ML algos are accepting instance weights. We keep replicating 
> code to test instance weighting in every test suite, which will get out of 
> hand rather quickly. We can and should implement some generic instance weight 
> test helper methods so that we can reduce duplicated code and standardize 
> these tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17645) Add feature selector methods based on: False Discovery Rate (FDR) and Family Wise Error rate (FWE)

2016-12-28 Thread Yanbo Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang resolved SPARK-17645.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> Add feature selector methods based on: False Discovery Rate (FDR) and Family 
> Wise Error rate (FWE)
> --
>
> Key: SPARK-17645
> URL: https://issues.apache.org/jira/browse/SPARK-17645
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Reporter: Peng Meng
>Assignee: Peng Meng
>Priority: Minor
> Fix For: 2.2.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Univariate feature selection works by selecting the best features based on 
> univariate statistical tests. 
> FDR and FWE are a popular univariate statistical test for feature selection.
> In 2005, the Benjamini and Hochberg paper on FDR was identified as one of the 
> 25 most-cited statistical papers. The FDR uses the Benjamini-Hochberg 
> procedure in this PR. https://en.wikipedia.org/wiki/False_discovery_rate. 
> In statistics, FWE is the probability of making one or more false 
> discoveries, or type I errors, among all the hypotheses when performing 
> multiple hypotheses tests.
> https://en.wikipedia.org/wiki/Family-wise_error_rate
> We add FDR and FWE methods for ChiSqSelector in this PR, like it is 
> implemented in scikit-learn. 
> http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17642:


Assignee: Apache Spark

> support DESC FORMATTED TABLE COLUMN command to show column-level statistics
> ---
>
> Key: SPARK-17642
> URL: https://issues.apache.org/jira/browse/SPARK-17642
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Zhenhua Wang
>Assignee: Apache Spark
>
> Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command.
> Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
> We should resolve this jira after column-level statistics are supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17642:


Assignee: (was: Apache Spark)

> support DESC FORMATTED TABLE COLUMN command to show column-level statistics
> ---
>
> Key: SPARK-17642
> URL: https://issues.apache.org/jira/browse/SPARK-17642
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Zhenhua Wang
>
> Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command.
> Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
> We should resolve this jira after column-level statistics are supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782943#comment-15782943
 ] 

Apache Spark commented on SPARK-17642:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/16422

> support DESC FORMATTED TABLE COLUMN command to show column-level statistics
> ---
>
> Key: SPARK-17642
> URL: https://issues.apache.org/jira/browse/SPARK-17642
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Zhenhua Wang
>
> Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command.
> Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
> We should resolve this jira after column-level statistics are supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19015) SQL request with transformation cannot be eecuted if not run first a scan table

2016-12-28 Thread lakhdar adil (JIRA)

lakhdar adil created SPARK-19015:


 Summary: SQL request with transformation cannot be eecuted if not 
run first a scan table
 Key: SPARK-19015
 URL: https://issues.apache.org/jira/browse/SPARK-19015
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: lakhdar adil


Hello,

I have a spark streaming wich turn on kafka and send results to ElasticSearch.

I have an union request between two tables: "statswithrowid" table and  
"queryes" table

sqlContext.sql(s"select id, rowid,agentId,datecalcul,'KAFKA' as source from 
statswithrowid  where id IN ($ids) and agentId = '$agent' UNION select id, 
rowid,agentId,datecalcul, 'ES' as source from queryes where agentId = '$agent'")

This request cannot be executed lonely. Today i need to execute first those two 
requests so that my union request can be working fine. Please find below my two 
requests which must be launched first before union request :

request on "statswithrowid " table :
  sqlContext.sql(s"select id, rowid,agentId,datecalcul,'KAFKA' as 
source from statswithrowid  where id IN ($ids) and agentId = '$agent'").show()

request on "queryes" table : 
  sqlContext.sql(s"select id, rowid,agentId,datecalcul, 'ES' as 
source from queryes where agentId = '$agent'").show()

For information : if i don't mention .show() on two requests to launch before 
union, nothing can work.

Why i need to launch that first before making union request ? what is the best 
way to work with union request ? i try union with dataframe, and i have the 
same probleme.

I look forward your reply. Thank you in advance

Best regards,
Adil LAKHDAR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19014) support complex aggregate buffer in HashAggregateExec

2016-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782927#comment-15782927
 ] 

Apache Spark commented on SPARK-19014:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16417

> support complex aggregate buffer in HashAggregateExec
> -
>
> Key: SPARK-19014
> URL: https://issues.apache.org/jira/browse/SPARK-19014
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19014) support complex aggregate buffer in HashAggregateExec

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19014:


Assignee: Wenchen Fan  (was: Apache Spark)

> support complex aggregate buffer in HashAggregateExec
> -
>
> Key: SPARK-19014
> URL: https://issues.apache.org/jira/browse/SPARK-19014
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19014) support complex aggregate buffer in HashAggregateExec

2016-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19014:


Assignee: Apache Spark  (was: Wenchen Fan)

> support complex aggregate buffer in HashAggregateExec
> -
>
> Key: SPARK-19014
> URL: https://issues.apache.org/jira/browse/SPARK-19014
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19014) support complex aggregate buffer in HashAggregateExec

2016-12-28 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-19014:
---

 Summary: support complex aggregate buffer in HashAggregateExec
 Key: SPARK-19014
 URL: https://issues.apache.org/jira/browse/SPARK-19014
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2016-12-28 Thread Zhenhua Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-17642:
-
Description: 
Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command.
Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
We should resolve this jira after column-level statistics are supported.


  was:
Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
We should resolve this jira after column-level statistics are supported.



> support DESC FORMATTED TABLE COLUMN command to show column-level statistics
> ---
>
> Key: SPARK-17642
> URL: https://issues.apache.org/jira/browse/SPARK-17642
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Zhenhua Wang
>
> Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command.
> Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
> We should resolve this jira after column-level statistics are supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2016-12-28 Thread Zhenhua Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-17642:
-
Description: 
Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
We should resolve this jira after column-level statistics are supported.


  was:
Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
We should resolve this jira after column-level statistics including histograms 
are supported.



> support DESC FORMATTED TABLE COLUMN command to show column-level statistics
> ---
>
> Key: SPARK-17642
> URL: https://issues.apache.org/jira/browse/SPARK-17642
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Zhenhua Wang
>
> Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.
> We should resolve this jira after column-level statistics are supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18993) Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags

2016-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-18993:
-

Assignee: Sean Owen

> Unable to build/compile Spark in IntelliJ due to missing Scala deps in 
> spark-tags
> -
>
> Key: SPARK-18993
> URL: https://issues.apache.org/jira/browse/SPARK-18993
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Xiao Li
>Assignee: Sean Owen
>Priority: Critical
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> After https://github.com/apache/spark/pull/16311 is merged, I am unable to 
> build it in my IntelliJ. Got the following compilation error:
> {noformat}
> Error:scalac: error while loading Object, Missing dependency 'object scala in 
> compiler mirror', required by 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class)
> Error:scalac: Error: object scala in compiler mirror not found.
> scala.reflect.internal.MissingRequirementError: object scala in compiler 
> mirror not found.
>   at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17)
>   at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395)
>   at scala.tools.nsc.Global$Run.(Global.scala:1215)
>   at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105)
>   at xsbt.CachedCompiler0.run(CompilerInterface.scala:105)
>   at xsbt.CachedCompiler0.run(CompilerInterface.scala:94)
>   at xsbt.CompilerInterface.run(CompilerInterface.scala:22)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
>   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47)
>   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41)
>   at 
> org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29)
>   at 
> org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26)
>   at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67)
>   at 
> org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24)
>   at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala)
>   at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18993) Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags

2016-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18993.
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.0.3
   2.1.1

Issue resolved by pull request 16418
[https://github.com/apache/spark/pull/16418]

> Unable to build/compile Spark in IntelliJ due to missing Scala deps in 
> spark-tags
> -
>
> Key: SPARK-18993
> URL: https://issues.apache.org/jira/browse/SPARK-18993
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Xiao Li
>Priority: Critical
> Fix For: 2.1.1, 2.0.3, 2.2.0
>
>
> After https://github.com/apache/spark/pull/16311 is merged, I am unable to 
> build it in my IntelliJ. Got the following compilation error:
> {noformat}
> Error:scalac: error while loading Object, Missing dependency 'object scala in 
> compiler mirror', required by 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class)
> Error:scalac: Error: object scala in compiler mirror not found.
> scala.reflect.internal.MissingRequirementError: object scala in compiler 
> mirror not found.
>   at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17)
>   at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395)
>   at scala.tools.nsc.Global$Run.(Global.scala:1215)
>   at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105)
>   at xsbt.CachedCompiler0.run(CompilerInterface.scala:105)
>   at xsbt.CachedCompiler0.run(CompilerInterface.scala:94)
>   at xsbt.CompilerInterface.run(CompilerInterface.scala:22)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
>   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47)
>   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41)
>   at 
> org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29)
>   at 
> org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26)
>   at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67)
>   at 
> org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24)
>   at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala)
>   at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 111 matches

Mail list logo