hudi-bot opened a new issue, #14709:
URL: https://github.com/apache/hudi/issues/14709

   When querying a hoodie partition through presto, we get following error:
   
   {code}
   Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed 
partition in an input format with UseFileSplitsFromInputFormat annotation: 
HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': 
{u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an 
input format with UseFileSplitsFromInputFormat annotation: HoodieInputFormat', 
u'type': u'com.facebook.presto.spi.PrestoException', u'stack': 
[u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)',
 
u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)',
 u'java.base/java.security.AccessController.doPrivileged(Native Method)', 
u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', 
u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)',
 
u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInD
 oAs(UserGroupInformationUtils.java:27)', 
u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)',
 
u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)',
 
u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)',
 
u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)',
 
u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)',
 
u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)',
 
u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)',
 u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)', 
u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)',
 u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Thre
 adPoolExecutor.java:1128)', 
u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)',
 u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': 
u'NOT_SUPPORTED'}
   {code}
   
   Figure out how to add support for bucketed partitions.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-1414
   - Type: New Feature
   - Affects version(s):
     - 0.9.0
   
   
   ---
   
   
   ## Comments
   
   24/Nov/20 01:53;vinoth;??I have a requirement to compact datalake but need 
bucketing on top of compaction so that during query time, only the files 
relevant to the "id" in query would be scanned. Is that supported in Hudi? If 
not, is it possible to extend Hudi to support it? Hello Team - we have a need 
for bucketing our datasets (primarily to keep the parquet file size optimized 
for faster read). We see that Hudi doesn't support bucketing now. Are there any 
plans to support bucketing in the future???
   ??I have a requirement to compact datalake but need bucketing on top of 
compaction so that during query time, only the files relevant to the "id" in 
query would be scanned. Is that supported in Hudi? If not, is it possible to 
extend Hudi to support it? Following up on the email"Bucketing in Hudi", we 
would like to schedule a meeting to understand and estimate the code changes 
needed to achieve bucketing in Hudi. The high level requirements are as 
detailed in email but we could chat further in the meeting to get into 
specifics. When would be the earliest we could have this discussion? ??
    
    
    ;;;
   
   ---
   
   24/Nov/20 12:18;linshan;I'm interested in this ticket。 I want to try it.;;;
   
   ---
   
   11/Dec/20 09:09;vinoth;Please go for it. [~linshan]. Apologies for the very 
late response. ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to