Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2024-02-07 Thread via GitHub


codope closed issue #10315: [SUPPORT] How to skip some partitions in a table 
when readStreaming in Spark at the init stage
URL: https://github.com/apache/hudi/issues/10315


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2024-02-07 Thread via GitHub


ad1happy2go commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1933446992

   @lei-su-awx Closing this issue then. thanks.  Please reopen in case you see 
issue again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2024-01-10 Thread via GitHub


lei-su-awx commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1884659448

   @ad1happy2go sorry for the late, I did not test this anymore because I meet 
new issues. But I think `.filer("operation_type  'update')` can do what I want, 
so never mind this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2024-01-10 Thread via GitHub


ad1happy2go commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1884502236

   @lei-su-awx Any updates here? Do you still have this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-29 Thread via GitHub


ad1happy2go commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1871878668

   @lei-su-awx Did you see the behaviour I mentioned? Any more updates here 
please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-19 Thread via GitHub


ad1happy2go commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1862294498

   @lei-su-awx You dont need to read from that directory. You should give the 
parent directory only. According to your `.filer("operation_type  'update')` 
partition pruning will take place and it will only process that partition data. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-12 Thread via GitHub


lei-su-awx commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1853172037

   @ad1happy2go I tried to only read files under that partition using 
spark(spark.readStream), but an error was thrown:
   no .hoodie file exists in the partition path, and I found only .hoodie 
folder only exist under the table path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-12 Thread via GitHub


ad1happy2go commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1852418826

   @lei-su-awx If the table is partitioned then it should only read the files 
under that partition. Are you seeing any behaviour otherwise if it is reading 
all files?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-12 Thread via GitHub


danny0405 commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1851775346

   > but I want a config that can tell source that only reads the partition 
that in my configs so I do not need to use filter
   
   That does not follow the common intuition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-11 Thread via GitHub


lei-su-awx commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1851457169

   Hi @danny0405 , do you mean like this:
   https://github.com/apache/hudi/assets/19327659/946c95b5-77f5-4d34-a315-a284c6b95b37";>
   I tried this, but also will read other partitions' data file to resolve 
schema. And I think kind of filter takes effect after source load all 
partitions files, but I want a config that can tell source that only reads the 
partition that in my configs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-11 Thread via GitHub


danny0405 commented on issue #10315:
URL: https://github.com/apache/hudi/issues/10315#issuecomment-1851443273

   Did you try to add filter condition with the partition fields?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org