[ https://issues.apache.org/jira/browse/PHOENIX-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406837#comment-15406837 ]
Josh Mahonin commented on PHOENIX-2547: --------------------------------------- Nice work [~kalyanhadoop] Very straight forward, and the unit tests should be enough coverage to make sure nothing's broken here. I'd like to get a chance to compile, run and put it through an internal regression test before a +1, should be able to confirm this week. > Spark Data Source API: Filter operation doesn't work for column names > containing a white space > ---------------------------------------------------------------------------------------------- > > Key: PHOENIX-2547 > URL: https://issues.apache.org/jira/browse/PHOENIX-2547 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.6.0 > Reporter: Suhas Nalapure > Assignee: Josh Mahonin > Priority: Critical > Labels: verify > Fix For: 4.9.0 > > Attachments: phoenix_spark.patch > > > Dataframe.filter() results in > "org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): > Syntax error. Mismatched input. Expecting "LPAREN", got "first" at line 1, > column 52." when a column name has a white space in it. > Steps to Reproduce > -------------------------- > 1. Create a test table & insert a row as below > create table "space" ("key" varchar primary key, "first name" varchar); > upsert into "space" values ('key1', 'xyz'); > 2. Java code that leads to the error: > //omitting the DataFrame creation part > df = df.filter(df.col("first name").equalTo("xyz")); > System.out.println(df.collectAsList()); > 3. I could see the following statements in the Phoenix logs which may have > led to the exception (stack trace given below) > 2015-12-28 17:52:24,327 INFO [main] > org.apache.phoenix.mapreduce.PhoenixInputFormat > UseSelectColumns=true, selectColumnList.size()=2, selectColumnList=key,first > name > 2015-12-28 17:52:24,328 INFO [main] > org.apache.phoenix.mapreduce.PhoenixInputFormat > Select Statement: SELECT "key","0"."first name" FROM "space" WHERE ( first > name = 'xyz') > 2015-12-28 17:52:24,333 ERROR [main] > org.apache.phoenix.mapreduce.PhoenixInputFormat > Failed to get the query plan with error [ERROR 604 (42P00): Syntax error. > Mismatched input. Expecting "LPAREN", got "first" at line 1, column 52.] > Exception Stack Trace: > ------------------------------ > java.lang.RuntimeException: > org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): > Syntax error. Mismatched input. Expecting "LPAREN", got "first" at line 1, > column 52. > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:125) > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:80) > at > org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:95) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.phoenix.spark.PhoenixRDD.getPartitions(PhoenixRDD.scala:48) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:905) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) > at org.apache.spark.rdd.RDD.collect(RDD.scala:904) > at > org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1.apply(DataFrame.scala:1395) > at > org.apache.spark.sql.DataFrame$$anonfun$collectAsList$1.apply(DataFrame.scala:1395) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collectAsList(DataFrame.scala:1394) > at > com.dataken.dataframe.dao.DataFrameDAOTest.testPhoenixGetDataFrame(DataFrameDAOTest.java:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86) > at > org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382) > at > org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) > Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 > (42P00): Syntax error. Mismatched input. Expecting "LPAREN", got "first" at > line 1, column 52. > at > org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33) > at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111) > at > org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1229) > at > org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1310) > at > org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1320) > at > org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1315) > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:118) > ... 78 more > Caused by: MismatchedTokenException(60!=90) > at > org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:351) > at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) > at > org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6403) > at > org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6223) > at > org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6160) > at > org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6125) > at > org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6405) > at > org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6223) > at > org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6160) > at > org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6125) > at > org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4328) > at > org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4410) > at > org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4475) > at > org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:757) > at > org.apache.phoenix.parse.PhoenixSQLParser.statement(PhoenixSQLParser.java:499) > at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:108) > ... 83 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)