[ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-829:
---------------------------------
    Description: 
[~uditme] Created this ticket to track some discussion on read/query path of 
spark with Hudi tables. 

My understanding is that when you read Hudi tables through spark-shell, some of 
your queries are slower due to some sequential activity performed by spark when 
interacting with Hudi tables (even with spark.sql.hive.convertMetastoreParquet 
which can give you the same data reading speed and all the vectorization 
benefits). Is this slowness observed during spark query planning ? Can you 
please elaborate on this ? 

  was:
[~uditme] Created this ticket to track some discussion on read/query path of 
spark with Hudi tables. 

My understanding is that when you read Hudi tables through spark-shell, some of 
your queries are slower due to some sequential activity performed by spark when 
interacting with Hudi tables. Can you please elaborate on this ? 


> Efficiently reading hudi tables through spark-shell
> ---------------------------------------------------
>
>                 Key: HUDI-829
>                 URL: https://issues.apache.org/jira/browse/HUDI-829
>             Project: Apache Hudi (incubating)
>          Issue Type: Task
>          Components: Spark Integration
>            Reporter: Nishith Agarwal
>            Assignee: Nishith Agarwal
>            Priority: Major
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to