[jira] [Commented] (HUDI-1414) HoodieInputFormat support for bucketed partitions

2020-12-11 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247770#comment-17247770
 ] 

Vinoth Chandar commented on HUDI-1414:
--

Please go for it. [~linshan]. Apologies for the very late response. 

> HoodieInputFormat support for bucketed partitions
> -
>
> Key: HUDI-1414
> URL: https://issues.apache.org/jira/browse/HUDI-1414
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Presto Integration
>Reporter: Satish Kotha
>Assignee: linshan-ma
>Priority: Major
> Fix For: 0.8.0
>
>
> When querying a hoodie partition through presto, we get following error:
> {code}
> Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed 
> partition in an input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': 
> {u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an 
> input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'type': u'com.facebook.presto.spi.PrestoException', 
> u'stack': 
> [u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)',
>  u'java.base/java.security.AccessController.doPrivileged(Native Method)', 
> u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', 
> u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)',
>  
> u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)',
>  
> u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)',
>  u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': 
> u'NOT_SUPPORTED'}
> {code}
> Figure out how to add support for bucketed partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1414) HoodieInputFormat support for bucketed partitions

2020-11-24 Thread linshan-ma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238081#comment-17238081
 ] 

linshan-ma commented on HUDI-1414:
--

I'm interested in this ticket。 I want to try it.

> HoodieInputFormat support for bucketed partitions
> -
>
> Key: HUDI-1414
> URL: https://issues.apache.org/jira/browse/HUDI-1414
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Presto Integration
>Reporter: Satish Kotha
>Priority: Major
> Fix For: 0.8.0
>
>
> When querying a hoodie partition through presto, we get following error:
> {code}
> Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed 
> partition in an input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': 
> {u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an 
> input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'type': u'com.facebook.presto.spi.PrestoException', 
> u'stack': 
> [u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)',
>  u'java.base/java.security.AccessController.doPrivileged(Native Method)', 
> u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', 
> u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)',
>  
> u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)',
>  
> u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)',
>  u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': 
> u'NOT_SUPPORTED'}
> {code}
> Figure out how to add support for bucketed partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1414) HoodieInputFormat support for bucketed partitions

2020-11-23 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237785#comment-17237785
 ] 

Vinoth Chandar commented on HUDI-1414:
--

??I have a requirement to compact datalake but need bucketing on top of 
compaction so that during query time, only the files relevant to the "id" in 
query would be scanned. Is that supported in Hudi? If not, is it possible to 
extend Hudi to support it? Hello Team - we have a need for bucketing our 
datasets (primarily to keep the parquet file size optimized for faster read). 
We see that Hudi doesn't support bucketing now. Are there any plans to support 
bucketing in the future???
??I have a requirement to compact datalake but need bucketing on top of 
compaction so that during query time, only the files relevant to the "id" in 
query would be scanned. Is that supported in Hudi? If not, is it possible to 
extend Hudi to support it? Following up on the email"Bucketing in Hudi", we 
would like to schedule a meeting to understand and estimate the code changes 
needed to achieve bucketing in Hudi. The high level requirements are as 
detailed in email but we could chat further in the meeting to get into 
specifics. When would be the earliest we could have this discussion? ??
 
 
 

> HoodieInputFormat support for bucketed partitions
> -
>
> Key: HUDI-1414
> URL: https://issues.apache.org/jira/browse/HUDI-1414
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Presto Integration
>Reporter: Satish Kotha
>Priority: Major
> Fix For: 0.8.0
>
>
> When querying a hoodie partition through presto, we get following error:
> {code}
> Presto error: {u'errorCode': 13, u'message': u'Presto cannot read bucketed 
> partition in an input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'errorType': u'USER_ERROR', u'failureInfo': 
> {u'suppressed': [], u'message': u'Presto cannot read bucketed partition in an 
> input format with UseFileSplitsFromInputFormat annotation: 
> HoodieInputFormat', u'type': u'com.facebook.presto.spi.PrestoException', 
> u'stack': 
> [u'com.facebook.presto.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:432)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)',
>  u'java.base/java.security.AccessController.doPrivileged(Native Method)', 
> u'java.base/javax.security.auth.Subject.doAs(Subject.java:361)', 
> u'org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1816)',
>  
> u'com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)',
>  
> u'com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:430)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:330)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:116)',
>  
> u'com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:259)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)',
>  
> u'com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)',
>  
> u'com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)',
>  
> u'java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)',
>  u'java.base/java.lang.Thread.run(Thread.java:834)']}, u'errorName': 
> u'NOT_SUPPORTED'}
> {code}
> Figure out how to add support for bucketed partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)