[jira] [Comment Edited] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427233#comment-15427233
 ] 

Prasanth Jayachandran edited comment on HIVE-14574 at 8/18/16 9:55 PM:
---

The 3 byte difference will still be there based on what split strategy is 
chosen. If a big file is chosen by ETL split strategy the first split will 
start from 3 offset. If chosen by BI split strategy the first split will start 
from 0. My fix was related to inconsistently choosing strategies based on AM 
cache being on or off. 


was (Author: prasanth_j):
The 3 byte difference will still be there based on what split strategy is 
chose. If a big file is chosen by ETL split strategy the first split will start 
from 3 offset. If chosen by BI split strategy the first split will start from 
0. My fix was related to inconsistently choosing strategies based on AM cache 
being on or off. 

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-18 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427319#comment-15427319
 ] 

Sergey Shelukhin edited comment on HIVE-14574 at 8/18/16 10:59 PM:
---

We can easily achieve unique IDs by taking ZK node name (which is unique and 
sequential). However, as the new node is re-added to the tail on every restart, 
it throws everything off.
What we want conceptually is that restarted nodes go into the same position in 
the order as the nodes they replaced, but that is difficult to achieve (or 
impossible, I am not sure we have such a concept with Slider). We can just have 
sequential numbers in ZK to take, with every registering node fighting for the 
lowest number. I wonder if there's already a primitive for that in curator ;) 
That way the replacement nodes take the place of the nodes that died, most of 
the time, and leave the running ones undisturbed for most of the time.
We can also assume we usually restart in the same place and order by the first 
time there was LLAP on that particular node for that cluster, then by name. 


was (Author: sershe):
We can easily achieve unique IDs by taking ZK node name (which is unique and 
sequential). However, as the new node is re-added to the tail on every restart, 
it throws everything off.
What we want conceptually is that restarted nodes go into the same position in 
the order as the nodes they replaced, but that is difficult to achieve (or 
impossible, I am not sure we have such a concept with Slider). We can just have 
sequential numbers in ZK to take, with every registering node fighting for the 
lowest number. I wonder if there's already a primitive for that in curator ;)
We can also assume we usually restart in the same place and order by the first 
time there was LLAP on that particular node for that cluster, then by name. 

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14574) use consistent hashing for LLAP consistent splits to alleviate impact from cluster changes

2016-08-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431860#comment-15431860
 ] 

Sergey Shelukhin edited comment on HIVE-14574 at 8/23/16 12:30 AM:
---

This patch actually breaks unique Ids cause it is no longer propagated to 
clients... I'm fixing this in the other JIRA and may just resolve this as dup 
of that, or port it back into this patch.
The reason is that worker-x is also unique (for debug purposes too) so 
there's no point in the guid. Is there?


was (Author: sershe):
This patch actually breaks unique Ids cause it is no longer propagated to 
clients... I'm fixing this in the other JIRA and may just resolve this as dup 
of that, or port it back into that patch.
The reason is that worker-x is also unique (for debug purposes too) so 
there's no point in the guid. Is there?

> use consistent hashing for LLAP consistent splits to alleviate impact from 
> cluster changes
> --
>
> Key: HIVE-14574
> URL: https://issues.apache.org/jira/browse/HIVE-14574
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14574.01.patch, HIVE-14574.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)