[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532810711
 
 
   > @umehrot2 I think balaji has his hands full with the release atm. Do you 
have bandwidth to try moving to spark 2.4 and do these changes on top?
   
   Sure. I will take this up then.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532808342
 
 
   @vinothchandar At EMR we do not have a use-case to support Spark 2.3 or 
earlier. We would be offering Hudi starting with our latest release which has 
Spark 2.4.3. Anything earlier than this we would not be supporting.
   
   So, it might be a good idea to just move to 2.4 and drop support for earlier 
versions.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532475222
 
 
   @vinothchandar At the moment, I cannot think of a good way how we can 
upgrade avro version while still continuing to support Spark 2.3 or earlier. 
What @cdmikechen has mentioned about asking users for this additional step of 
dropping `avro 1.8.2` jars in spark's classpath could be one option.
   
   If we agree that it is fine, either me or @cdmikechen can create a new PR 
based off this, with following changes:
   - Upgrade parquet version
   - Rollback Timestamp conversion to Logical Type, and continue to support it 
like String
   
   It appears like with the above 2 changes, this PR can be in a state to be 
merged. We can continue on the Timestamp issue in a separate Jira/PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532354352
 
 
   @vinothchandar @cdmikechen 
   
   I was able to read and write `Decimal` type correctly by upgrading that 
parquet version to `1.8.2`. This PR needs to be updated accordingly.
   
   Is there a way we can prioritize this work and get it merged ? Is there any 
additional testing that I can help perform which can give us confidence that it 
can be merged ? @cdmikechen you mentioned there are still some issues. If you 
would like and can point it out here, I would be willing to help out with that 
as well.
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-16 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532026904
 
 
   > @umehrot2 Great analysis.. Would upgrading parquet-avro help?
   
   Good point @vinothchandar . Upon a quick look at `AvroSchemaConverter` code 
in `parquet-avro` it seems like handling of `LogicalType` conversion was added 
since parquet 1.8.2:
   
   
https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L379
   
   It is not there in `parquet 1.8.1` which is what Hudi uses right now:
   
https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.1/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java
   
   I will upgrade the parquet version and test. Will update here with what I 
find.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-16 Thread GitBox
umehrot2 commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532014549
 
 
   > should we first rebase and resolve the conflicts?
   
   For my testing, I had re-based this PR on top of release-0.5.0. But yes, the 
@cdmikechen should may be rebase the PR. But the issue will still exist.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services