Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
codope closed issue #10315: [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage URL: https://github.com/apache/hudi/issues/10315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
ad1happy2go commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1933446992 @lei-su-awx Closing this issue then. thanks. Please reopen in case you see issue again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
lei-su-awx commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1884659448 @ad1happy2go sorry for the late, I did not test this anymore because I meet new issues. But I think `.filer("operation_type 'update')` can do what I want, so never mind this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
ad1happy2go commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1884502236 @lei-su-awx Any updates here? Do you still have this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
ad1happy2go commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1871878668 @lei-su-awx Did you see the behaviour I mentioned? Any more updates here please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
ad1happy2go commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1862294498 @lei-su-awx You dont need to read from that directory. You should give the parent directory only. According to your `.filer("operation_type 'update')` partition pruning will take place and it will only process that partition data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
lei-su-awx commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1853172037 @ad1happy2go I tried to only read files under that partition using spark(spark.readStream), but an error was thrown: no .hoodie file exists in the partition path, and I found only .hoodie folder only exist under the table path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
ad1happy2go commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1852418826 @lei-su-awx If the table is partitioned then it should only read the files under that partition. Are you seeing any behaviour otherwise if it is reading all files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
danny0405 commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1851775346 > but I want a config that can tell source that only reads the partition that in my configs so I do not need to use filter That does not follow the common intuition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
lei-su-awx commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1851457169 Hi @danny0405 , do you mean like this: https://github.com/apache/hudi/assets/19327659/946c95b5-77f5-4d34-a315-a284c6b95b37";> I tried this, but also will read other partitions' data file to resolve schema. And I think kind of filter takes effect after source load all partitions files, but I want a config that can tell source that only reads the partition that in my configs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]
danny0405 commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1851443273 Did you try to add filter condition with the partition fields? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org