[jira] [Commented] (HUDI-1414) HoodieInputFormat support for bucketed partitions
[ https://issues.apache.org/jira/browse/HUDI-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247770#comment-17247770 ] Vinoth Chandar commented on HUDI-1414: -- Please go for it. [~linshan]. Apologies for the very late response. > HoodieInputFormat support for bucketed partitions > - > > Key: HUDI-1414 > URL: https://issues.apache.org/jira/browse/HUDI-1414 > Project: Apache Hudi > Issue Type: New Feature > Components: Presto Integration >Reporter: Satish Kotha >Assignee: linshan-ma >Priority: Major > Fix For: 0.8.0 > > > When querying a hoodie partition through presto, we get following error: > {code} > Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed > partition in an input format with UseFileSplitsFromInputFormat annotation: > HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': > {u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an > input format with UseFileSplitsFromInputFormat annotation: > HoodieInputFormat', u'type': u'com.facebook.presto.spi.PrestoException', > u'stack': > [u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)', > > u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)', > u'java.base/java.security.AccessController.doPrivileged(Native Method)', > u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', > u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)', > > u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)', > > u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)', > > u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)', > > u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)', > > u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)', > > u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)', > > u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)', > > u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)', > u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': > u'NOT_SUPPORTED'} > {code} > Figure out how to add support for bucketed partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1414) HoodieInputFormat support for bucketed partitions
[ https://issues.apache.org/jira/browse/HUDI-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238081#comment-17238081 ] linshan-ma commented on HUDI-1414: -- I'm interested in this ticket。 I want to try it. > HoodieInputFormat support for bucketed partitions > - > > Key: HUDI-1414 > URL: https://issues.apache.org/jira/browse/HUDI-1414 > Project: Apache Hudi > Issue Type: New Feature > Components: Presto Integration >Reporter: Satish Kotha >Priority: Major > Fix For: 0.8.0 > > > When querying a hoodie partition through presto, we get following error: > {code} > Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed > partition in an input format with UseFileSplitsFromInputFormat annotation: > HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': > {u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an > input format with UseFileSplitsFromInputFormat annotation: > HoodieInputFormat', u'type': u'com.facebook.presto.spi.PrestoException', > u'stack': > [u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)', > > u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)', > u'java.base/java.security.AccessController.doPrivileged(Native Method)', > u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', > u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)', > > u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)', > > u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)', > > u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)', > > u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)', > > u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)', > > u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)', > > u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)', > > u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)', > u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': > u'NOT_SUPPORTED'} > {code} > Figure out how to add support for bucketed partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1414) HoodieInputFormat support for bucketed partitions
[ https://issues.apache.org/jira/browse/HUDI-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237785#comment-17237785 ] Vinoth Chandar commented on HUDI-1414: -- ??I have a requirement to compact datalake but need bucketing on top of compaction so that during query time, only the files relevant to the "id" in query would be scanned. Is that supported in Hudi? If not, is it possible to extend Hudi to support it? Hello Team - we have a need for bucketing our datasets (primarily to keep the parquet file size optimized for faster read). We see that Hudi doesn't support bucketing now. Are there any plans to support bucketing in the future??? ??I have a requirement to compact datalake but need bucketing on top of compaction so that during query time, only the files relevant to the "id" in query would be scanned. Is that supported in Hudi? If not, is it possible to extend Hudi to support it? Following up on the email"Bucketing in Hudi", we would like to schedule a meeting to understand and estimate the code changes needed to achieve bucketing in Hudi. The high level requirements are as detailed in email but we could chat further in the meeting to get into specifics. When would be the earliest we could have this discussion? ?? > HoodieInputFormat support for bucketed partitions > - > > Key: HUDI-1414 > URL: https://issues.apache.org/jira/browse/HUDI-1414 > Project: Apache Hudi > Issue Type: New Feature > Components: Presto Integration >Reporter: Satish Kotha >Priority: Major > Fix For: 0.8.0 > > > When querying a hoodie partition through presto, we get following error: > {code} > Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed > partition in an input format with UseFileSplitsFromInputFormat annotation: > HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': > {u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an > input format with UseFileSplitsFromInputFormat annotation: > HoodieInputFormat', u'type': u'com.facebook.presto.spi.PrestoException', > u'stack': > [u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)', > > u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)', > u'java.base/java.security.AccessController.doPrivileged(Native Method)', > u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', > u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)', > > u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)', > > u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)', > > u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)', > > u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)', > > u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)', > > u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)', > > u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)', > > u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)', > > u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)', > u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': > u'NOT_SUPPORTED'} > {code} > Figure out how to add support for bucketed partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)