[ https://issues.apache.org/jira/browse/SPARK-19383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842488#comment-15842488 ]
Sean Owen commented on SPARK-19383: ----------------------------------- This sounds like unsupported syntax. I'm not even sure "PER PARTITION LIMIT" exists in Hive? > Spark Sql Fails with Cassandra 3.6 and later PER PARTITION LIMIT option > ------------------------------------------------------------------------ > > Key: SPARK-19383 > URL: https://issues.apache.org/jira/browse/SPARK-19383 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2 > Environment: PER PARTITION LIMIT Error documented in github and > reproducible by cloning: > [BrentDorsey/cassandra-spark-job|https://github.com/BrentDorsey/cassandra-spark-job] > Java 1.8 > Cassandra Version > [cqlsh 5.0.1 | Cassandra 3.9.0 | CQL spec 3.4.2 | Native protocol v4] > {code:title=POM.xml|borderStyle=solid} > <dependency> > <groupId>com.datastax.spark</groupId> > <artifactId>spark-cassandra-connector_2.10</artifactId> > <version>2.0.0-M3</version> > </dependency> > <dependency> > <groupId>com.datastax.cassandra</groupId> > <artifactId>cassandra-driver-mapping</artifactId> > <version>3.1.2</version> > </dependency> > <dependency> > <groupId>org.apache.hadoop</groupId> > <artifactId>hadoop-common</artifactId> > <version>2.72</version> > <scope>compile</scope> > </dependency> > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-catalyst_2.10</artifactId> > <version>2.0.2</version> > <scope>compile</scope> > </dependency> > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-core_2.10</artifactId> > <version>2.0.2</version> > <scope>compile</scope> > </dependency> > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-sql_2.10</artifactId> > <version>2.0.2</version> > <scope>compile</scope> > </dependency> > {code} > Reporter: Brent Dorsey > Priority: Minor > Labels: Cassandra > > Attempting to use version 2.0.0-M3 of the datastax/spark-cassandra-connector > to select the most recent version of each partition key using the Cassandra > 3.6 and later PER PARTITION LIMIT option fails. I've tried using all the > Cassandra Java RDD's and Spark Sql with and without partition key equality > constraints. All attempts have failed due to syntax errors and/or start/end > bound restriction errors. > The > [BrentDorsey/cassandra-spark-job|https://github.com/BrentDorsey/cassandra-spark-job] > repo contains working code that demonstrates the error. Clone the repo, > create the keyspace and table locally and supply connection information then > run main. > Spark Dataset .where & Spark Sql Errors: > {code:title=errors|borderStyle=solid} > ERROR [2017-01-27 06:35:19,919] (main) > org.per.partition.limit.test.spark.job.Main: > getSparkDatasetPerPartitionLimitTestWithTokenGreaterThan failed. > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'PARTITION' expecting <EOF>(line 1, pos 67) > == SQL == > TOKEN(item_uuid) > TOKEN(6616b548-4fd1-4661-a938-0af3c77357f7) PER PARTITION > LIMIT 1 > -------------------------------------------------------------------^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseExpression(ParseDriver.scala:43) > at org.apache.spark.sql.Dataset.where(Dataset.scala:1153) > at > org.per.partition.limit.test.spark.job.Main.getSparkDatasetPerPartitionLimitTestWithTokenGreaterThan(Main.java:349) > at org.per.partition.limit.test.spark.job.Main.run(Main.java:128) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.per.partition.limit.test.spark.job.Main.main(Main.java:72) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) > ERROR [2017-01-27 06:35:20,238] (main) > org.per.partition.limit.test.spark.job.Main: > getSparkSqlDatasetPerPartitionLimitTest failed. > org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input ''' expecting {'(', 'SELECT', 'FROM', 'ADD', 'AS', 'ALL', > 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', > 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', > 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', > 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', > 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', > 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', > 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', > 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', '+', '-', '*', 'DIV', > '~', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', > 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', > 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', > 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', > 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', > TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', > 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', > 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', > 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', > 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', > 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', > 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', > 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', > 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', > 'CURRENT_DATE', 'CURRENT_TIMESTAMP', STRING, BIGINT_LITERAL, > SMALLINT_LITERAL, TINYINT_LITERAL, INTEGER_VALUE, DECIMAL_VALUE, > SCIENTIFIC_DECIMAL_VALUE, DOUBLE_LITERAL, BIGDECIMAL_LITERAL, IDENTIFIER, > BACKQUOTED_IDENTIFIER}(line 1, pos 36) > == SQL == > SELECT item_uuid, time_series_date, 'item_uri FROM perPartitionLimitTests PER > PARTITION LIMIT 1 > ------------------------------------^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) > at > org.per.partition.limit.test.spark.job.Main.getSparkSqlDatasetPerPartitionLimitTest(Main.java:367) > at org.per.partition.limit.test.spark.job.Main.run(Main.java:129) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.per.partition.limit.test.spark.job.Main.main(Main.java:72) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org