[GitHub] [hudi] parisni commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-10-14 Thread GitBox


parisni commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-943156528


   @nsivabalan Also I have to mention this is OSS spark 2.4.4 with metastore 
overwrite with aws glue to connect spark to glue: 
https://github.com/awslabs/aws-glue-libs
   This might be related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] parisni commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-10-13 Thread GitBox


parisni commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-942282671






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] parisni commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-10-13 Thread GitBox


parisni commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-942307668


   @nsivabalan 
   
we are using OSS spark 2.4.4 on EMR 5.
   the hudi bundle is: --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.9.0,org.apache.spark:spark-avro_2.11:2.4.4
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] parisni commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-10-13 Thread GitBox


parisni commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-942282671


   same issue here:
   ```
   # This fails with the above error
   spark.sql("select * from my_table_rt").show() 
   # also this fails with the same error
   spark.read.format("hudi").load(my_table_path).show()
   # this works 
   spark.sql("select * from my_table_ro").show()
   ```
   
   Using our own spark 2.4.4 build compiled with glue metastore on emr 5. 
   
   ```
   Traceback (most recent call last):
 File "", line 1, in 
 File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 380, in show
   print(self._jdf.showString(n, 20, vertical))
 File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", 
line 1257, in __call__
 File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
   return f(*a, **kw)
 File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", 
line 328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o138.showString.
   : java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile.(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
   at 
org.apache.hudi.MergeOnReadSnapshotRelation$$anonfun$7.apply(MergeOnReadSnapshotRelation.scala:217)
   at 
org.apache.hudi.MergeOnReadSnapshotRelation$$anonfun$7.apply(MergeOnReadSnapshotRelation.scala:209)
   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
   at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
   at scala.collection.immutable.List.foreach(List.scala:392)
   at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
   at scala.collection.immutable.List.map(List.scala:296)
   at 
org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:209)
   at 
org.apache.hudi.MergeOnReadSnapshotRelation.buildScan(MergeOnReadSnapshotRelation.scala:110)
   at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:309)
   at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:309)
   at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:342)
   at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:341)
   at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:419)
   at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:337)
   at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:305)
   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
   at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
   at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
   at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
   at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
   at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
   at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
   at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)