hudi-bot opened a new issue, #14709:
URL: https://github.com/apache/hudi/issues/14709
When querying a hoodie partition through presto, we get following error:
{code}
Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed
partition in an input format with UseFileSplitsFromInputFormat annotation:
HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo':
{u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an
input format with UseFileSplitsFromInputFormat annotation: HoodieInputFormat',
u'type': u'com.facebook.presto.spi.PrestoException', u'stack':
[u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)',
u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)',
u'java.base/java.security.AccessController.doPrivileged(Native Method)',
u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)',
u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)',
u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInD
oAs(UserGroupInformationUtils.java:27)',
u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)',
u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)',
u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)',
u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)',
u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)',
u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)',
u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)',
u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)',
u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)',
u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Thre
adPoolExecutor.java:1128)',
u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)',
u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName':
u'NOT_SUPPORTED'}
{code}
Figure out how to add support for bucketed partitions.
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-1414
- Type: New Feature
- Affects version(s):
- 0.9.0
---
## Comments
24/Nov/20 01:53;vinoth;??I have a requirement to compact datalake but need
bucketing on top of compaction so that during query time, only the files
relevant to the "id" in query would be scanned. Is that supported in Hudi? If
not, is it possible to extend Hudi to support it? Hello Team - we have a need
for bucketing our datasets (primarily to keep the parquet file size optimized
for faster read). We see that Hudi doesn't support bucketing now. Are there any
plans to support bucketing in the future???
??I have a requirement to compact datalake but need bucketing on top of
compaction so that during query time, only the files relevant to the "id" in
query would be scanned. Is that supported in Hudi? If not, is it possible to
extend Hudi to support it? Following up on the email"Bucketing in Hudi", we
would like to schedule a meeting to understand and estimate the code changes
needed to achieve bucketing in Hudi. The high level requirements are as
detailed in email but we could chat further in the meeting to get into
specifics. When would be the earliest we could have this discussion? ??
;;;
---
24/Nov/20 12:18;linshan;I'm interested in this ticket。 I want to try it.;;;
---
11/Dec/20 09:09;vinoth;Please go for it. [~linshan]. Apologies for the very
late response. ;;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]