[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-10-03 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636665#comment-16636665
 ] 

Miklos Gergely commented on HIVE-20536:
---

[~ashutoshc] sure, I'll add a test.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch, HIVE-20536.07.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-10-02 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636480#comment-16636480
 ] 

Ashutosh Chauhan commented on HIVE-20536:
-

[~mgergely] [~vgarg] Actually merge statement already supports default values 
via HIVE-19129 
So, from Miklos's example following should already work:
{code}
MERGE INTO t1
...
WHEN NOT MATCHED THEN INSERT (a, b) VALUES (t2.a, t2.b, default);
{code}
Miklos, can you add test for that?

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch, HIVE-20536.07.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-25 Thread Andrew Sears (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628196#comment-16628196
 ] 

Andrew Sears commented on HIVE-20536:
-

Is this bringing some of the functionality of IDENTITY / Sequence / 
Auto_increment to Hive? Sounds useful!

Surrogate key functionality could be documented in wiki here.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch, HIVE-20536.07.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621404#comment-16621404
 ] 

Hive QA commented on HIVE-20536:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12940397/HIVE-20536.07.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14992 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13917/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13917/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13917/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12940397 - PreCommit-HIVE-Build

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch, HIVE-20536.07.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621374#comment-16621374
 ] 

Hive QA commented on HIVE-20536:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
1s{color} | {color:blue} ql in master has 2326 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 2 new + 640 unchanged - 0 
fixed = 642 total (was 640) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13917/dev-support/hive-personality.sh
 |
| git revision | master / 487714a |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13917/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13917/yetus/whitespace-eol.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13917/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch, HIVE-20536.07.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can d

[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-19 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620716#comment-16620716
 ] 

Ashutosh Chauhan commented on HIVE-20536:
-

-ve values for surrogate keys are ok, since surrogate keys by defintion has no 
semantic meaning, its just an identifier.
+1 pending tests.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch, HIVE-20536.07.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-19 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620377#comment-16620377
 ] 

Miklos Gergely commented on HIVE-20536:
---

[~ashutoshc] In java the first bit of a 64bit long is the sign, so if we use it 
as a 64 bit number, and the writeId is big enough to use it's first bit than 
the resulting value would be negative, but still unique values will be 
provided. I'm not sure if this is a problem, but we can simply avoid it by 
allowing only 63 bits to be used.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch, HIVE-20536.07.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620337#comment-16620337
 ] 

Hive QA commented on HIVE-20536:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12940305/HIVE-20536.06.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13901/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13901/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13901/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12940305/HIVE-20536.06.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12940305 - PreCommit-HIVE-Build

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620066#comment-16620066
 ] 

Hive QA commented on HIVE-20536:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12940305/HIVE-20536.06.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 14980 tests 
executed
*Failed tests:*
{noformat}
TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) 
(batchId=243)
TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) 
(batchId=243)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13894/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13894/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13894/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12940305 - PreCommit-HIVE-Build

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-18 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620011#comment-16620011
 ] 

Hive QA commented on HIVE-20536:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
9s{color} | {color:blue} ql in master has 2326 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 1 new + 640 unchanged - 0 
fixed = 641 total (was 640) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13894/dev-support/hive-personality.sh
 |
| git revision | master / 9c90776 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13894/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13894/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate ne

[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-18 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619945#comment-16619945
 ] 

Vineet Garg commented on HIVE-20536:


Created jira to support column schema with MERGE statement to allow DEFAULT 
constraint https://issues.apache.org/jira/browse/HIVE-20590

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619931#comment-16619931
 ] 

Ashutosh Chauhan commented on HIVE-20536:
-

OK, can you please create a jira for it, so that we dont forget about it.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-18 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619928#comment-16619928
 ] 

Vineet Garg commented on HIVE-20536:


Oh it skipped my mind!  Unfortunately MERGE doesn't allow column schema 
specification with the INSERT therefore DEFAULT isn't supported.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619819#comment-16619819
 ] 

Ashutosh Chauhan commented on HIVE-20536:
-

[~vgarg] Don't we support default clause for merge?

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch, 
> HIVE-20536.06.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-18 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618941#comment-16618941
 ] 

Miklos Gergely commented on HIVE-20536:
---

[~ashutoshc] I've tested the function with multi-table insert statement, and it 
was working fine. I couldn't test it with MERGE, because AFAIK currently MERGE 
doesnt' support inserting into specified columns. Assume that t1 has 3 columns
 * a int
 * b int
 * c bigint default surrogate_key()

In order to insert into t1 one should specify the columns like this:
{noformat}
INSERT INTO t1 (a, b) VALUES (1, 2);{noformat}
The same thing can not be done using merge, specifiying (a, b) like this is not 
supported:
{noformat}
MERGE INTO t1
...
WHEN NOT MATCHED THEN INSERT (a, b) VALUES (t2.a, t2.b);{noformat}
Inserting without specifying the columns result in an error message that there 
are less values than columns.

I suggest to support this syntax.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch, HIVE-20536.05.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618318#comment-16618318
 ] 

Hive QA commented on HIVE-20536:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12939989/HIVE-20536.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 14970 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13868/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13868/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13868/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12939989 - PreCommit-HIVE-Build

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618284#comment-16618284
 ] 

Hive QA commented on HIVE-20536:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
13s{color} | {color:blue} ql in master has 2326 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 2 new + 639 unchanged - 0 
fixed = 641 total (was 639) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13868/dev-support/hive-personality.sh
 |
| git revision | master / a782330 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13868/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13868/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allo

[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-17 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618145#comment-16618145
 ] 

Ashutosh Chauhan commented on HIVE-20536:
-

Can you create a review board request on https://reviews.apache.org/ It makes 
reviewing code easier.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch, HIVE-20536.04.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616255#comment-16616255
 ] 

Hive QA commented on HIVE-20536:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12939682/HIVE-20536.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14942 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=78)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13801/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13801/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13801/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12939682 - PreCommit-HIVE-Build

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-15 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616250#comment-16616250
 ] 

Hive QA commented on HIVE-20536:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
14s{color} | {color:blue} ql in master has 2311 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
48s{color} | {color:red} ql: The patch generated 2 new + 639 unchanged - 0 
fixed = 641 total (was 639) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
21s{color} | {color:red} ql generated 1 new + 2311 unchanged - 0 fixed = 2312 
total (was 2311) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Unchecked/unconfirmed cast from 
org.apache.hadoop.hive.ql.exec.MapredContext to 
org.apache.hadoop.hive.ql.exec.tez.TezContext in 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFSurrogateKey.configure(MapredContext)
  At GenericUDFSurrogateKey.java:org.apache.hadoop.hive.ql.exec.tez.TezContext 
in 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFSurrogateKey.configure(MapredContext)
  At GenericUDFSurrogateKey.java:[line 101] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13801/dev-support/hive-personality.sh
 |
| git revision | master / 08d9083 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13801/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13801/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13801/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13801/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch, 
> HIVE-20536.03.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint d

[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613688#comment-16613688
 ] 

Hive QA commented on HIVE-20536:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12939467/HIVE-20536.02.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13759/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13759/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13759/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12939467/HIVE-20536.02.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12939467 - PreCommit-HIVE-Build

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613686#comment-16613686
 ] 

Hive QA commented on HIVE-20536:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12939467/HIVE-20536.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 14939 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=78)
org.apache.hive.jdbc.miniHS2.TestHs2ConnectionMetricsBinary.testOpenConnectionMetrics
 (batchId=255)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/13758/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/13758/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-13758/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12939467 - PreCommit-HIVE-Build

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-13 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613622#comment-16613622
 ] 

Hive QA commented on HIVE-20536:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 2311 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
46s{color} | {color:red} ql: The patch generated 4 new + 639 unchanged - 0 
fixed = 643 total (was 639) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 12 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-13758/dev-support/hive-personality.sh
 |
| git revision | master / 265 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13758/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13758/yetus/whitespace-eol.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-13758/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage o

[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-12 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612709#comment-16612709
 ] 

Miklos Gergely commented on HIVE-20536:
---

Fixed, now writeId is directly set for GenericUDFSurrogateKey. Will add unit 
tests too.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch, HIVE-20536.02.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-12 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612695#comment-16612695
 ] 

Ashutosh Chauhan commented on HIVE-20536:
-

adding tableDesc to GenericUDF is not a good idea. Its a public interface and 
exposing internal structures there isn't useful. Instead in genFileSinkDesc() 
test for surrogateKey udf and if found set writeid directly on that udf.
If qtest is not possible then lets write junit test for udf and mock Context 
object if needed.
Also, can you create a RB for this.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20536) Add Surrogate Keys function to Hive

2018-09-12 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612677#comment-16612677
 ] 

Miklos Gergely commented on HIVE-20536:
---

No q test added as task id is not available at q tests.

> Add Surrogate Keys function to Hive
> ---
>
> Key: HIVE-20536
> URL: https://issues.apache.org/jira/browse/HIVE-20536
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Attachments: HIVE-20536.01.patch
>
>
> Surrogate keys is an ability to generate and use unique integers for each row 
> in a table. If we have that ability then in conjunction with default clause 
> we can get surrogate keys functionality. Consider following ddl:
> create table t1 (a string, b bigint default unique_long());
> We already have default clause wherein you can specify a function to provide 
> values. So, what we need is udf which can generate unique longs for each row 
> across queries for a table. 
> Idea is to use write_id . This is a column in metastore table TXN_COMPONENTS 
> whose value is determined at compile time to be used during query execution. 
> Each query execution generates a new write_id. So, we can seed udf with this 
> value during compilation.
> Then we statically allocate ranges for each task from which it can draw next 
> long. So, lets say 64-bit write_id we divy up such that last 24 bits belong 
> to original usage of it that is txns. Next 16 bits are used for task_attempts 
> and last 24 bits to generate new long for each row. This implies we can allow 
> 17M txns, 65K tasks and 17M rows in a task. If you hit any of those limits we 
> can fail the query.
> Implementation wise: serialize write_id in initialize() of udf. Then during 
> execute() we find out what task_attempt current task is and use it along with 
> write_id() to get starting long and give a new value on each invocation of 
> execute().
> Here we are assuming write_id can be determined at compile time, which should 
> be the case but we need to figure out how to get handle to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)