[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1583807262 I probably looked at the hive branch 2.0 2.1 code and it should be the same as 2.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1583792529 @danny0405 I submitted a pr to be compatible with hive2.2, copied part of the code of hive2.3 to hudi, and converted the data structure of hive2.2 to the form in 2.3 for processing. I don’t know how to do this Is it reasonable and there may be some problems in the code, please help to review the code #8911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1582168044 Look at this test, the logic of this split should not be careless ![image](https://github.com/apache/hudi/assets/20243868/4b0c1b9f-4520-485b-8728-360270f85d4e) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1581778050 > On yarn ![image](https://user-images.githubusercontent.com/20243868/243678460-41ba247c-32dc-47a7-b383-53e90b14849d.png) The problem of failure in this place should be caused by hive mapred.map.tasks.speculative.execution. The status of the failed task is also in new status. Set this parameter to false. After several tests, it is found that the task will not fail again. Case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1580676828 > https://github.com/apache/hudi/blob/2294c52c36ebcdac735e9565fb78a70181ddb3a5/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java#L900-L956 > > Hi @danny0405 @thomasg19930417, I'm confused about the `maxSize` and `counter`, too. Is it a careless mistake? @n3nash Hello, can you help explain the implementation of this place ,I think this implementation was added at #1053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1580633139 > > NoClassDefFoundError: org/apache/hadoop/hive/common/StringInternUtils > > we may borrow `StringInternUtils` from hive to support hive version < 2.3 Some minor changes on the hive api still require certain changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1580630562 > > NoClassDefFoundError:org/apache/hadoop/hive/common/StringInternUtils > > 我们可以从 hive 借用`StringInternUtils`以支持 hive version < 2.3 Some minor changes on the hive api still require certain changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578533981 On yarn ![image](https://github.com/apache/hudi/assets/20243868/41ba247c-32dc-47a7-b383-53e90b14849d) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578529904 I also found another problem. When I set mapreduce.input.fileinputformat.split.maxsize=1, although the map task is greater than 1, there will be a task failure on yarn and then try again (I don’t see any exceptions here). The final task is successful of -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578525134 ![image](https://github.com/apache/hudi/assets/20243868/a1f280b5-cc61-473b-942c-3420e8af33bb) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578523112 ![image](https://github.com/apache/hudi/assets/20243868/cef75b74-0a49-448f-9e8e-661423a16783) Why is this implemented here? mapreduce.input.fileinputformat.split.maxsize (this unit should be bytes) is usually very large. According to this logic, all inputs may end up with only one split. I did a test and it is indeed In this way, if the default is 256m, there will only be one map task. When I set this parameter to 1, n map tasks will start. Is there any limit to this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578020377 I tried to modify it directly, but the map split of the hive query is only 1, causing the task to exceed the yarn memory limit and fail. When the query limit is 10, the data in the rt table can be queried normally. Some code logic is not very clear. I am not sure what the current change is no problem -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578017076 In hive 2.2 ![image](https://github.com/apache/hudi/assets/20243868/f5b9b5b0-07ee-492e-bdcc-9df3b5afa833) In hive 2.3 ![image](https://github.com/apache/hudi/assets/20243868/01dda425-6ce7-4a66-8a77-6f582ca0df58) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578015039 The main difference is that the structure returned when obtaining Path -> Alias from MapWork is different. At present, if the hive version is directly modified, then the part of HoodieCombineHiveInputFormat that involves this part of path processing needs to be changed. Another change is HiveInputFormat.pushProjectionsAndFilters, modify The workload of the code should not be large In hive2.3 ![image](https://github.com/apache/hudi/assets/20243868/37b160e9-603a-4b55-878f-235a5b8eb5a9) In hive2.2.0 ![image](https://github.com/apache/hudi/assets/20243868/fa862fa2-086b-456c-a5fd-7b43ac70fe5f) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1576517627 I tried to make modifications and found that it's not just a simple copy related class and renamed it. There are changes in the API for hive2.2 and hive2.3, and there are also some other modifications involved, so I don't think there's a good way to be compatible with both versions at the same time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1576140270 Thank you for your reply. Once the test is successful, I will submit a PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1576108820 > Yeah, do you have intreast to contribute a fix for this? We can write our own impl for `StringInternUtils` because it does not have good version compatibility. ![image](https://github.com/apache/hudi/assets/20243868/21626469-00c8-456c-a4ae-2a11009a83b2) I'm not sure if it's the only one that refers to the relevant classes. From the search results, it is yes. In this way, you only need to copy the implementation of hive to the implementation in hudi-mr. Can you help to check whether this is correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1575949058 Can these methods be extended once to be better compatible with hive? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1575945668 I confirmed that this class does not exist in the currently used hive version, so it only supports part of the Hive2.x version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception
thomasg19930417 commented on issue #8882: URL: https://github.com/apache/hudi/issues/8882#issuecomment-1575944762 I found a similar issue https://github.com/apache/hudi/issues/3795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org