[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-08 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1583807262

   I probably looked at the hive branch 2.0 2.1 code and it should be the same 
as 2.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-08 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1583792529

   @danny0405 I submitted a pr to be compatible with hive2.2, copied part of 
the code of hive2.3 to hudi, and converted the data structure of hive2.2 to the 
form in 2.3 for processing. I don’t know how to do this Is it reasonable and 
there may be some problems in the code, please help to review the code  #8911


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-08 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1582168044

   Look at this test, the logic of this split should not be careless
   
   
![image](https://github.com/apache/hudi/assets/20243868/4b0c1b9f-4520-485b-8728-360270f85d4e)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-07 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1581778050

   > On yarn 
![image](https://user-images.githubusercontent.com/20243868/243678460-41ba247c-32dc-47a7-b383-53e90b14849d.png)
   
   The problem of failure in this place should be caused by hive 
mapred.map.tasks.speculative.execution. The status of the failed task is also 
in new status. Set this parameter to false. After several tests, it is found 
that the task will not fail again. Case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-07 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1580676828

   > 
https://github.com/apache/hudi/blob/2294c52c36ebcdac735e9565fb78a70181ddb3a5/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java#L900-L956
   > 
   > Hi @danny0405 @thomasg19930417, I'm confused about the `maxSize` and 
`counter`, too. Is it a careless mistake?
   
   @n3nash  Hello, can you help explain the implementation of this place  ,I 
think this implementation was added at  #1053
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-07 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1580633139

   > > NoClassDefFoundError: org/apache/hadoop/hive/common/StringInternUtils
   > 
   > we may borrow `StringInternUtils` from hive to support hive version < 2.3
   
   Some minor changes on the hive api still require certain changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-07 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1580630562

   > > NoClassDefFoundError:org/apache/hadoop/hive/common/StringInternUtils
   > 
   > 我们可以从 hive 借用`StringInternUtils`以支持 hive version < 2.3
   
   Some minor changes on the hive api still require certain changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-06 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578533981

   On yarn
   
![image](https://github.com/apache/hudi/assets/20243868/41ba247c-32dc-47a7-b383-53e90b14849d)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-06 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578529904

   I also found another problem. When I set 
mapreduce.input.fileinputformat.split.maxsize=1, although the map task is 
greater than 1, there will be a task failure on yarn and then try again (I 
don’t see any exceptions here). The final task is successful of


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-06 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578525134

   
![image](https://github.com/apache/hudi/assets/20243868/a1f280b5-cc61-473b-942c-3420e8af33bb)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-06 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578523112

   
![image](https://github.com/apache/hudi/assets/20243868/cef75b74-0a49-448f-9e8e-661423a16783)
   Why is this implemented here? mapreduce.input.fileinputformat.split.maxsize 
(this unit should be bytes) is usually very large. According to this logic, all 
inputs may end up with only one split. I did a test and it is indeed In this 
way, if the default is 256m, there will only be one map task. When I set this 
parameter to 1, n map tasks will start. Is there any limit to this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-06 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578020377

   I tried to modify it directly, but the map split of the hive query is only 
1, causing the task to exceed the yarn memory limit and fail. When the query 
limit is 10, the data in the rt table can be queried normally. Some code logic 
is not very clear. I am not sure what the current change is no problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-06 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578017076

   In hive 2.2
   
![image](https://github.com/apache/hudi/assets/20243868/f5b9b5b0-07ee-492e-bdcc-9df3b5afa833)
   
   In hive 2.3
   
![image](https://github.com/apache/hudi/assets/20243868/01dda425-6ce7-4a66-8a77-6f582ca0df58)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-06 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1578015039

   The main difference is that the structure returned when obtaining Path -> 
Alias from MapWork is different. At present, if the hive version is directly 
modified, then the part of HoodieCombineHiveInputFormat that involves this part 
of path processing needs to be changed. Another change is 
HiveInputFormat.pushProjectionsAndFilters, modify The workload of the code 
should not be large
   
   In  hive2.3  
   
![image](https://github.com/apache/hudi/assets/20243868/37b160e9-603a-4b55-878f-235a5b8eb5a9)
   In hive2.2.0
   
![image](https://github.com/apache/hudi/assets/20243868/fa862fa2-086b-456c-a5fd-7b43ac70fe5f)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-05 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1576517627

   I tried to make modifications and found that it's not just a simple copy 
related class and renamed it. There are changes in the API for hive2.2 and 
hive2.3, and there are also some other modifications involved, so I don't think 
there's a good way to be compatible with both versions at the same time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-05 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1576140270

   Thank you for your reply. Once the test is successful, I will submit a PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-05 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1576108820

   > Yeah, do you have intreast to contribute a fix for this? We can write our 
own impl for `StringInternUtils` because it does not have good version 
compatibility.
   
![image](https://github.com/apache/hudi/assets/20243868/21626469-00c8-456c-a4ae-2a11009a83b2)
   
   I'm not sure if it's the only one that refers to the relevant classes. From 
the search results, it is yes. In this way, you only need to copy the 
implementation of hive to the implementation in hudi-mr. Can you help to check 
whether this is correct?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-04 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1575949058

   Can these methods be extended once to be better compatible with hive?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-04 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1575945668

   I confirmed that this class does not exist in the currently used hive 
version, so it only supports part of the Hive2.x version


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] thomasg19930417 commented on issue #8882: [SUPPORT] Using hive to read rt table exception

2023-06-04 Thread via GitHub


thomasg19930417 commented on issue #8882:
URL: https://github.com/apache/hudi/issues/8882#issuecomment-1575944762

   I found a similar issue   https://github.com/apache/hudi/issues/3795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org