[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532810711 > @umehrot2 I think balaji has his hands full with the release atm. Do you have bandwidth to try moving to spark 2.4 and do these changes on top? Sure. I will take this up then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532808342 @vinothchandar At EMR we do not have a use-case to support Spark 2.3 or earlier. We would be offering Hudi starting with our latest release which has Spark 2.4.3. Anything earlier than this we would not be supporting. So, it might be a good idea to just move to 2.4 and drop support for earlier versions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532475222 @vinothchandar At the moment, I cannot think of a good way how we can upgrade avro version while still continuing to support Spark 2.3 or earlier. What @cdmikechen has mentioned about asking users for this additional step of dropping `avro 1.8.2` jars in spark's classpath could be one option. If we agree that it is fine, either me or @cdmikechen can create a new PR based off this, with following changes: - Upgrade parquet version - Rollback Timestamp conversion to Logical Type, and continue to support it like String It appears like with the above 2 changes, this PR can be in a state to be merged. We can continue on the Timestamp issue in a separate Jira/PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532354352 @vinothchandar @cdmikechen I was able to read and write `Decimal` type correctly by upgrading that parquet version to `1.8.2`. This PR needs to be updated accordingly. Is there a way we can prioritize this work and get it merged ? Is there any additional testing that I can help perform which can give us confidence that it can be merged ? @cdmikechen you mentioned there are still some issues. If you would like and can point it out here, I would be willing to help out with that as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532026904 > @umehrot2 Great analysis.. Would upgrading parquet-avro help? Good point @vinothchandar . Upon a quick look at `AvroSchemaConverter` code in `parquet-avro` it seems like handling of `LogicalType` conversion was added since parquet 1.8.2: https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L379 It is not there in `parquet 1.8.1` which is what Hudi uses right now: https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.1/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java I will upgrade the parquet version and test. Will update here with what I find. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532014549 > should we first rebase and resolve the conflicts? For my testing, I had re-based this PR on top of release-0.5.0. But yes, the @cdmikechen should may be rebase the PR. But the issue will still exist. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services