[jira] [Commented] (SPARK-19383) Spark Sql Fails with Cassandra 3.6 and later PER PARTITION LIMIT option

Sean Owen (JIRA) Fri, 27 Jan 2017 02:09:38 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842488#comment-15842488
 ]


Sean Owen commented on SPARK-19383:
-----------------------------------

This sounds like unsupported syntax. I'm not even sure "PER PARTITION LIMIT" 
exists in Hive?

> Spark Sql Fails with Cassandra 3.6 and later PER PARTITION LIMIT option 
> ------------------------------------------------------------------------
>
>                 Key: SPARK-19383
>                 URL: https://issues.apache.org/jira/browse/SPARK-19383
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2
>         Environment: PER PARTITION LIMIT Error documented in github and 
> reproducible by cloning: 
> [BrentDorsey/cassandra-spark-job|https://github.com/BrentDorsey/cassandra-spark-job]
> Java 1.8
> Cassandra Version
> [cqlsh 5.0.1 | Cassandra 3.9.0 | CQL spec 3.4.2 | Native protocol v4]
> {code:title=POM.xml|borderStyle=solid}
> <dependency>
>             <groupId>com.datastax.spark</groupId>
>             <artifactId>spark-cassandra-connector_2.10</artifactId>
>             <version>2.0.0-M3</version>
>         </dependency>
>         <dependency>
>             <groupId>com.datastax.cassandra</groupId>
>             <artifactId>cassandra-driver-mapping</artifactId>
>             <version>3.1.2</version>
>         </dependency>
>         <dependency>
>             <groupId>org.apache.hadoop</groupId>
>             <artifactId>hadoop-common</artifactId>
>             <version>2.72</version>
>             <scope>compile</scope>
>         </dependency>
>         <dependency>
>             <groupId>org.apache.spark</groupId>
>             <artifactId>spark-catalyst_2.10</artifactId>
>             <version>2.0.2</version>
>             <scope>compile</scope>
>         </dependency>
>         <dependency>
>             <groupId>org.apache.spark</groupId>
>             <artifactId>spark-core_2.10</artifactId>
>             <version>2.0.2</version>
>             <scope>compile</scope>
>         </dependency>
>         <dependency>
>             <groupId>org.apache.spark</groupId>
>             <artifactId>spark-sql_2.10</artifactId>
>             <version>2.0.2</version>
>             <scope>compile</scope>
>         </dependency>
> {code}
>            Reporter: Brent Dorsey
>            Priority: Minor
>              Labels: Cassandra
>
> Attempting to use version 2.0.0-M3 of the datastax/spark-cassandra-connector 
> to select the most recent version of each partition key using the Cassandra 
> 3.6 and later PER PARTITION LIMIT option fails. I've tried using all the 
> Cassandra Java RDD's and Spark Sql with and without partition key equality 
> constraints. All attempts have failed due to syntax errors and/or start/end 
> bound restriction errors.
> The 
> [BrentDorsey/cassandra-spark-job|https://github.com/BrentDorsey/cassandra-spark-job]
>  repo contains working code that demonstrates the error. Clone the repo, 
> create the keyspace and table locally and supply connection information then 
> run main.
> Spark Dataset .where & Spark Sql Errors:
> {code:title=errors|borderStyle=solid}
> ERROR [2017-01-27 06:35:19,919] (main) 
> org.per.partition.limit.test.spark.job.Main: 
> getSparkDatasetPerPartitionLimitTestWithTokenGreaterThan failed.
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input 'PARTITION' expecting <EOF>(line 1, pos 67)
> == SQL ==
> TOKEN(item_uuid) > TOKEN(6616b548-4fd1-4661-a938-0af3c77357f7) PER PARTITION 
> LIMIT 1
> -------------------------------------------------------------------^^^
>       at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>       at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>       at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
>       at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseExpression(ParseDriver.scala:43)
>       at org.apache.spark.sql.Dataset.where(Dataset.scala:1153)
>       at 
> org.per.partition.limit.test.spark.job.Main.getSparkDatasetPerPartitionLimitTestWithTokenGreaterThan(Main.java:349)
>       at org.per.partition.limit.test.spark.job.Main.run(Main.java:128)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>       at org.per.partition.limit.test.spark.job.Main.main(Main.java:72)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> ERROR [2017-01-27 06:35:20,238] (main) 
> org.per.partition.limit.test.spark.job.Main: 
> getSparkSqlDatasetPerPartitionLimitTest failed.
> org.apache.spark.sql.catalyst.parser.ParseException: 
> extraneous input ''' expecting {'(', 'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 
> 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 
> 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 
> 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 
> 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 
> 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 
> 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 
> 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 
> 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', '+', '-', '*', 'DIV', 
> '~', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 
> 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 
> 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 
> 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 
> 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', 
> TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 
> 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 
> 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 
> 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 
> 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 
> 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 
> 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 
> 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 
> 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 
> 'CURRENT_DATE', 'CURRENT_TIMESTAMP', STRING, BIGINT_LITERAL, 
> SMALLINT_LITERAL, TINYINT_LITERAL, INTEGER_VALUE, DECIMAL_VALUE, 
> SCIENTIFIC_DECIMAL_VALUE, DOUBLE_LITERAL, BIGDECIMAL_LITERAL, IDENTIFIER, 
> BACKQUOTED_IDENTIFIER}(line 1, pos 36)
> == SQL ==
> SELECT item_uuid, time_series_date, 'item_uri FROM perPartitionLimitTests PER 
> PARTITION LIMIT 1
> ------------------------------------^^^
>       at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
>       at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
>       at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
>       at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>       at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
>       at 
> org.per.partition.limit.test.spark.job.Main.getSparkSqlDatasetPerPartitionLimitTest(Main.java:367)
>       at org.per.partition.limit.test.spark.job.Main.run(Main.java:129)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>       at org.per.partition.limit.test.spark.job.Main.main(Main.java:72)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19383) Spark Sql Fails with Cassandra 3.6 and later PER PARTITION LIMIT option

Reply via email to