date:20171119

[jira] [Updated] (HIVE-17898) Explain plan output enhancement

2017-11-19 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17898:
---
Status: Patch Available  (was: Open)

> Explain plan output enhancement
> ---
>
> Key: HIVE-17898
> URL: https://issues.apache.org/jira/browse/HIVE-17898
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17898.1.patch, HIVE-17898.2.patch, 
> HIVE-17898.3.patch, HIVE-17898.4.patch, HIVE-17898.5.patch, 
> HIVE-17898.6.patch, HIVE-17898.7.patch
>
>
> We would like to enhance the explain plan output to display additional 
> information e.g.:
> TableScan operator should have following additional info
> * Actual table name (currently only alias name is displayed)
> * Database name
> * Column names being scanned



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17898) Explain plan output enhancement

2017-11-19 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17898:
---
Status: Open  (was: Patch Available)

> Explain plan output enhancement
> ---
>
> Key: HIVE-17898
> URL: https://issues.apache.org/jira/browse/HIVE-17898
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17898.1.patch, HIVE-17898.2.patch, 
> HIVE-17898.3.patch, HIVE-17898.4.patch, HIVE-17898.5.patch, 
> HIVE-17898.6.patch, HIVE-17898.7.patch
>
>
> We would like to enhance the explain plan output to display additional 
> information e.g.:
> TableScan operator should have following additional info
> * Actual table name (currently only alias name is displayed)
> * Database name
> * Column names being scanned



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18043) Vectorization: Support List type in MapWork

2017-11-19 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-18043:

Attachment: HIVE-18043.001.patch

The implementation is based on the patch of HIVE-16198.
[~Ferd], [~vihangk1], as discussed in HIVE-17931, you can get the q-tests in 
this patch.

> Vectorization: Support List type in MapWork
> ---
>
> Key: HIVE-18043
> URL: https://issues.apache.org/jira/browse/HIVE-18043
> Project: Hive
>  Issue Type: Improvement
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-18043.001.patch
>
>
> Support Complex Types in vectorization is finished in HIVE-16589, but List 
> type is still not support in MapWork. It should be supported to improve the 
> performance when vectorization is enable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18043) Vectorization: Support List type in MapWork

2017-11-19 Thread Colin Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-18043:

Status: Patch Available  (was: Open)

> Vectorization: Support List type in MapWork
> ---
>
> Key: HIVE-18043
> URL: https://issues.apache.org/jira/browse/HIVE-18043
> Project: Hive
>  Issue Type: Improvement
>Reporter: Colin Ma
>Assignee: Colin Ma
> Attachments: HIVE-18043.001.patch
>
>
> Support Complex Types in vectorization is finished in HIVE-16589, but List 
> type is still not support in MapWork. It should be supported to improve the 
> performance when vectorization is enable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18080) Performance degradation on VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled

2017-11-19 Thread liyunzhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258856#comment-16258856
 ] 

liyunzhang commented on HIVE-18080:
---

[~gopalv]:
{{-prof perfasm}} depends 
[PrintAssembly|https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly] 
while PrintAssembly depends on [Kenai 
project|http://www.oracle.com/splash/kenai.com/decommissioning/index.html]. 
Oracle closes 
Kenai project. I download hsdis.so from others. Not sure this outdated hsdis.so 
can print assembly the instruction of AVX512 or not.


> Performance degradation on 
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> --
>
> Key: HIVE-18080
> URL: https://issues.apache.org/jira/browse/HIVE-18080
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang
> Attachments: log.logic.avx1.single.0, log_logic.avx1.part
>
>
> Use  Xeon(R) Platinum 8180 CPU to test the performance of 
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. 
> It seems performance(20%+) in cases in 
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
>  execpt 
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
>  and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2 us/op||AVX512 us/op ||  (AVX2-AVX512)/AVX2||
> |ColAndColBench|122510| 87014| 28.9%|
> |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
> |IfExprLongColumnRepeatingLongColumnBench|1397447|1480450|  -5.9%|
> |IfExprRepeatingLongColumnLongColumnBench|1401164|1483062|  -5.9% |
> |NotColBench|77042.83|51513.28|  33%|
> There are degradation in 
> IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench, 
> IfExprRepeatingLongColumnLongColumnBench, very confused why there is 
> degradation on IfExprLongColumnLongColumnBench cases.
> Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to 
> avoid the impact of dynamic CPU frequency scaling.
> my script
> {code}
> export JAVA_HOME=/home/zly/jdk-9.0.1/
> export PATH=$JAVA_HOME/bin:$PATH
> export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib
> for i in 0 1 2; do
> java -server -XX:UseAVX=3 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> for i in 0 1 2; do
> java -server -XX:UseAVX=2 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-18080) Performance degradation on VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled

2017-11-19 Thread liyunzhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258853#comment-16258853
 ] 

liyunzhang edited comment on HIVE-18080 at 11/20/17 6:38 AM:
-

[~gopalv]: using following command with {{-prof perfasm}} to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
i=0
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the 
[log.logic.avx1.single.0|https://issues.apache.org/jira/secure/attachment/12898421/log.logic.avx1.single.0]
 attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

[Hottest 
Regions]...

  

[Hottest Methods (after 
inlining)]..

  

[Distribution by 
Area]..

  

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}


was (Author: kellyzly):
[~gopal]: using following command with {{-prof perfasm}} to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
i=0
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the output attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

[Hottest 
Regions]...

  

[Hottest Methods (after 
inlining)]..

  

[Distribution by 
Area]..

  

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}

> Performance degradation on 
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> --
>
> Key: HIVE-18080
> URL: https://issues.apache.org/jira/browse/HIVE-18080
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang
> Attachments: log.logic.avx1.single.0, log_logic.avx1.part
>
>
> Use  Xeon(R) Platinum 8180 CPU to test the performance of 
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. 
> It seems performance(20%+) in cases in 
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
>  execpt 
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
>  and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2

[jira] [Updated] (HIVE-18080) Performance degradation on VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled

2017-11-19 Thread liyunzhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang updated HIVE-18080:
--
Attachment: log.logic.avx1.single.0

> Performance degradation on 
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> --
>
> Key: HIVE-18080
> URL: https://issues.apache.org/jira/browse/HIVE-18080
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang
> Attachments: log.logic.avx1.single.0, log_logic.avx1.part
>
>
> Use  Xeon(R) Platinum 8180 CPU to test the performance of 
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. 
> It seems performance(20%+) in cases in 
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
>  execpt 
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
>  and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2 us/op||AVX512 us/op ||  (AVX2-AVX512)/AVX2||
> |ColAndColBench|122510| 87014| 28.9%|
> |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
> |IfExprLongColumnRepeatingLongColumnBench|1397447|1480450|  -5.9%|
> |IfExprRepeatingLongColumnLongColumnBench|1401164|1483062|  -5.9% |
> |NotColBench|77042.83|51513.28|  33%|
> There are degradation in 
> IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench, 
> IfExprRepeatingLongColumnLongColumnBench, very confused why there is 
> degradation on IfExprLongColumnLongColumnBench cases.
> Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to 
> avoid the impact of dynamic CPU frequency scaling.
> my script
> {code}
> export JAVA_HOME=/home/zly/jdk-9.0.1/
> export PATH=$JAVA_HOME/bin:$PATH
> export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib
> for i in 0 1 2; do
> java -server -XX:UseAVX=3 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> for i in 0 1 2; do
> java -server -XX:UseAVX=2 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-18080) Performance degradation on VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled

2017-11-19 Thread liyunzhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258853#comment-16258853
 ] 

liyunzhang edited comment on HIVE-18080 at 11/20/17 6:35 AM:
-

[~gopal]: using following command with {{-prof perfasm}} to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
i=0
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the output attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

[Hottest 
Regions]...

  

[Hottest Methods (after 
inlining)]..

  

[Distribution by 
Area]..

  

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}


was (Author: kellyzly):
[~gopal]: using following command to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
i=0
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the output attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

[Hottest 
Regions]...

  

[Hottest Methods (after 
inlining)]..

  

[Distribution by 
Area]..

  

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}

> Performance degradation on 
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> --
>
> Key: HIVE-18080
> URL: https://issues.apache.org/jira/browse/HIVE-18080
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang
> Attachments: log_logic.avx1.part
>
>
> Use  Xeon(R) Platinum 8180 CPU to test the performance of 
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. 
> It seems performance(20%+) in cases in 
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
>  execpt 
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
>  and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2 us/op||AVX512 us/op ||  (AVX2-AVX512)/AVX2||
> |ColAndColBench|122510| 87014| 28.9%|
> |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
>

[jira] [Comment Edited] (HIVE-18080) Performance degradation on VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled

2017-11-19 Thread liyunzhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258853#comment-16258853
 ] 

liyunzhang edited comment on HIVE-18080 at 11/20/17 6:35 AM:
-

[~gopal]: using following command to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
i=0
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the output attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

[Hottest 
Regions]...

  

[Hottest Methods (after 
inlining)]..

  

[Distribution by 
Area]..

  

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}


was (Author: kellyzly):
[~gopal]: using following command to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the output attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

[Hottest 
Regions]...

  

[Hottest Methods (after 
inlining)]..

  

[Distribution by 
Area]..

  

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}

> Performance degradation on 
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> --
>
> Key: HIVE-18080
> URL: https://issues.apache.org/jira/browse/HIVE-18080
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang
> Attachments: log_logic.avx1.part
>
>
> Use  Xeon(R) Platinum 8180 CPU to test the performance of 
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. 
> It seems performance(20%+) in cases in 
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
>  execpt 
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
>  and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2 us/op||AVX512 us/op ||  (AVX2-AVX512)/AVX2||
> |ColAndColBench|122510| 87014| 28.9%|
> |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
>

[jira] [Commented] (HIVE-18080) Performance degradation on VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled

2017-11-19 Thread liyunzhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258853#comment-16258853
 ] 

liyunzhang commented on HIVE-18080:
---

[~gopal]: using following command to run the 
VectorizedLogicBench#IfExprLongColumnLongColumnBench in AVX1
{code}
export JAVA_HOME=/home/zly/sr601/jdk-9.0.1/
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/zly/sr601/jdk-9.0.1/mylib
java -server -XX:UseAVX=1 -jar benchmarks.jar  -prof perfasm 
org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 -f 
1 -bm avgt -tu us >log.logic.avx1.single.$i & export pid=$!
taskset -cp 1 $pid
wait $pid
{code}

the output attached, find some warning
{code}
PrintAssembly processed: 51105 total address lines.
Perf output processed (skipped 1.020 seconds):
 Column 1: cycles (0 events)
 Column 2: instructions (0 events)

[Hottest 
Regions]...

  

[Hottest Methods (after 
inlining)]..

  

[Distribution by 
Area]..

  

WARNING: The perf event count is suspiciously low (0). The performance data 
might be
inaccurate or misleading. Try to do the profiling again, or tune up the 
sampling frequency.
{code}

> Performance degradation on 
> VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled
> --
>
> Key: HIVE-18080
> URL: https://issues.apache.org/jira/browse/HIVE-18080
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang
> Attachments: log_logic.avx1.part
>
>
> Use  Xeon(R) Platinum 8180 CPU to test the performance of 
> [AVX512|https://en.wikipedia.org/wiki/AVX-512].
> {code}
> #cat /proc/cpuinfo |grep "model name"|head -n 1
> model name: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
> {code}
> Before that I have compiled hive with JDK9 as JDK9 enables AVX512 
> Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. 
> It seems performance(20%+) in cases in 
> {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}}
>  execpt 
> {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}}
>  and
> {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is 
> like following
> When i use Skylake CPU to evaluate the performance improvement of AVX512.
> I found the performance in VectorizedLogicBench is like following
> || ||AVX2 us/op||AVX512 us/op ||  (AVX2-AVX512)/AVX2||
> |ColAndColBench|122510| 87014| 28.9%|
> |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% |
> |IfExprLongColumnRepeatingLongColumnBench|1397447|1480450|  -5.9%|
> |IfExprRepeatingLongColumnLongColumnBench|1401164|1483062|  -5.9% |
> |NotColBench|77042.83|51513.28|  33%|
> There are degradation in 
> IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench, 
> IfExprRepeatingLongColumnLongColumnBench, very confused why there is 
> degradation on IfExprLongColumnLongColumnBench cases.
> Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to 
> avoid the impact of dynamic CPU frequency scaling.
> my script
> {code}
> export JAVA_HOME=/home/zly/jdk-9.0.1/
> export PATH=$JAVA_HOME/bin:$PATH
> export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib
> for i in 0 1 2; do
> java -server -XX:UseAVX=3 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> for i in 0 1 2; do
> java -server -XX:UseAVX=2 -jar benchmarks.jar 
> org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 
> -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$!
> taskset -cp 1 $pid
> wait $pid
> done
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18104) Issue in HIVE Update Command for set columns

2017-11-19 Thread Ravi Ranjan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Ranjan updated HIVE-18104:
---
Description: 
When Updating a table, error comes in when a wrong column name is entered in 
where clause but Mapreduce executes successfully when column name in set clause 
is wrong, though no value gets updated.

hive> describe test_table;
OK
run_sitevarchar(50)
run_yearint
run_month   int
data_loaded_yn  varchar(1)
run_datetimestamp
message string
datetimetimestamp
Time taken: 0.169 seconds, Fetched: 10 row(s)


hive> update test_table set abc='Y' where message='Processing';
Query ID = 20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id 
application_1508354216914_35481)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  2  200   0   0
Reducer 2 ..   SUCCEEDED  2  200   0   0

VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 9.52 s

Loading data to table test_table
Table test_table stats: [numFiles=39, numRows=3, totalSize=56417, rawDataSize=0]
OK
Time taken: 10.517 seconds



  was:
When Updating a table, error comes in when a wrong column name is entered in 
where clause but Mapreduce executes successfully when column name in set clause 
is wrong, though no value gets updated.

hive> describe test_table;
OK
run_sitevarchar(50)
run_yearint
run_month   int
data_loaded_yn  varchar(1)
run_datetimestamp
message string
datetimetimestamp
Time taken: 0.169 seconds, Fetched: 10 row(s)


hive> test_table set abc='Y' where message='Processing';
Query ID = 20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id 
application_1508354216914_35481)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  2  200   0   0
Reducer 2 ..   SUCCEEDED  2  200   0   0

VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 9.52 s

Loading data to table test_table
Table test_table stats: [numFiles=39, numRows=3, totalSize=56417, rawDataSize=0]
OK
Time taken: 10.517 seconds




> Issue in HIVE Update Command for set columns
> 
>
> Key: HIVE-18104
> URL: https://issues.apache.org/jira/browse/HIVE-18104
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Ravi Ranjan
>Priority: Critical
>
> When Updating a table, error comes in when a wrong column name is entered in 
> where clause but Mapreduce executes successfully when column name in set 
> clause is wrong, though no value gets updated.
> hive> describe test_table;
> OK
> run_sitevarchar(50)
> run_yearint
> run_month   int
> data_loaded_yn  varchar(1)
> run_datetimestamp
> message string
> datetimetimestamp
> Time taken: 0.169 seconds, Fetched: 10 row(s)
> 
> hive> update test_table set abc='Y' where message='Processing';
> Query ID = 20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1508354216914_35481)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  2  200   0  
>  0
> Reducer 2 ..   SUCCEEDED  2  2

[jira] [Updated] (HIVE-18104) Issue in HIVE Update Command for set columns

2017-11-19 Thread Ravi Ranjan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Ranjan updated HIVE-18104:
---
Description: 
When Updating a table, error comes in when a wrong column name is entered in 
where clause but Mapreduce executes successfully when column name in set clause 
is wrong, though no value gets updated.

hive> describe test_table;
OK
run_sitevarchar(50)
run_yearint
run_month   int
data_loaded_yn  varchar(1)
run_datetimestamp
message string
datetimetimestamp
Time taken: 0.169 seconds, Fetched: 10 row(s)


hive> test_table set abc='Y' where message='Processing';
Query ID = 20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id 
application_1508354216914_35481)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  2  200   0   0
Reducer 2 ..   SUCCEEDED  2  200   0   0

VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 9.52 s

Loading data to table test_table
Table test_table stats: [numFiles=39, numRows=3, totalSize=56417, rawDataSize=0]
OK
Time taken: 10.517 seconds



  was:
When Updating a table, error comes in when a wrong column name is entered in 
where clause but Mapreduce executes successfully when column name in set clause 
is wrong and no value gets updated.

hive> describe test_table;
OK
run_sitevarchar(50)
run_yearint
run_month   int
data_loaded_yn  varchar(1)
run_datetimestamp
message string
datetimetimestamp
Time taken: 0.169 seconds, Fetched: 10 row(s)


hive> test_table set abc='Y' where message='Processing';
Query ID = 20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id 
application_1508354216914_35481)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  2  200   0   0
Reducer 2 ..   SUCCEEDED  2  200   0   0

VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 9.52 s

Loading data to table test_table
Table test_table stats: [numFiles=39, numRows=3, totalSize=56417, rawDataSize=0]
OK
Time taken: 10.517 seconds




> Issue in HIVE Update Command for set columns
> 
>
> Key: HIVE-18104
> URL: https://issues.apache.org/jira/browse/HIVE-18104
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Ravi Ranjan
>Priority: Critical
>
> When Updating a table, error comes in when a wrong column name is entered in 
> where clause but Mapreduce executes successfully when column name in set 
> clause is wrong, though no value gets updated.
> hive> describe test_table;
> OK
> run_sitevarchar(50)
> run_yearint
> run_month   int
> data_loaded_yn  varchar(1)
> run_datetimestamp
> message string
> datetimetimestamp
> Time taken: 0.169 seconds, Fetched: 10 row(s)
> 
> hive> test_table set abc='Y' where message='Processing';
> Query ID = 20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1508354216914_35481)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  2  200   0  
>  0
> Reducer 2 ..   SUCCEEDED  2  200

[jira] [Updated] (HIVE-18104) Issue in HIVE Update Command for set columns

2017-11-19 Thread Ravi Ranjan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Ranjan updated HIVE-18104:
---
Description: 
When Updating a table, error comes in when a wrong column name is entered in 
where clause but Mapreduce executes successfully when column name in set clause 
is wrong and no value gets updated.

hive> describe test_table;
OK
run_sitevarchar(50)
run_yearint
run_month   int
data_loaded_yn  varchar(1)
run_datetimestamp
message string
datetimetimestamp
Time taken: 0.169 seconds, Fetched: 10 row(s)


hive> test_table set abc='Y' where message='Processing';
Query ID = 20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id 
application_1508354216914_35481)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  2  200   0   0
Reducer 2 ..   SUCCEEDED  2  200   0   0

VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 9.52 s

Loading data to table test_table
Table test_table stats: [numFiles=39, numRows=3, totalSize=56417, rawDataSize=0]
OK
Time taken: 10.517 seconds



  was:
When Updating a table, error comes in when a wrong column name is entered in 
where clause but Mapreduce executes successfully when column name in set clause 
is wrong and no value gets updated.

hive> describe test_table;OKrun_site    varchar(50)run_year 
   intrun_month   intdata_loaded_yn  varchar(1)run_date 
   timestampmessage stringdatetime    
timestampTime taken: 0.169 seconds, Fetched: 10 
row(s)hive> test_table set 
abc='Y' where message='Processing';Query ID = 
20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432Total jobs = 1Launching Job 
1 out of 1Status: Running (Executing on YARN cluster with App id 
application_1508354216914_35481)
    VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
KILLEDMap
 1 ..   SUCCEEDED  2  2    0    0   0   
0Reducer 2 ..   SUCCEEDED  2  2    0    0   0   
0VERTICES:
 02/02  [==>>] 100%  ELAPSED TIME: 9.52 
sLoading
 data to table test_tableTable astir_mi_db.astir_hv_lt_scenario_run stats: 
[numFiles=39, numRows=3, totalSize=56417, rawDataSize=0]OKTime taken: 10.517 
seconds


> Issue in HIVE Update Command for set columns
> 
>
> Key: HIVE-18104
> URL: https://issues.apache.org/jira/browse/HIVE-18104
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Ravi Ranjan
>Priority: Critical
>
> When Updating a table, error comes in when a wrong column name is entered in 
> where clause but Mapreduce executes successfully when column name in set 
> clause is wrong and no value gets updated.
> hive> describe test_table;
> OK
> run_sitevarchar(50)
> run_yearint
> run_month   int
> data_loaded_yn  varchar(1)
> run_datetimestamp
> message string
> datetimetimestamp
> Time taken: 0.169 seconds, Fetched: 10 row(s)
> 
> hive> test_table set abc='Y' where message='Processing';
> Query ID = 20171120052859_d95524f8-a9d3-48ad-aa84-2932696d3432
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1508354216914_35481)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  2  200   0  
>  0
> Reducer 2 ..   SUCCEEDED  2  200

[jira] [Comment Edited] (HIVE-17902) add notions of default pool and start adding unmanaged mapping

2017-11-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248783#comment-16248783
 ] 

Lefty Leverenz edited comment on HIVE-17902 at 11/20/17 4:28 AM:
-

Doc note:  This adds *hive.metastore.wm.default.pool.size* to HiveConf.java, so 
it needs to be documented in the wiki.  (Perhaps the LLAP section of 
Configuration Properties will have a subsection for workload management.)

* [Configuration Properties -- LLAP | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAP]

Added a TODOC3.0 label.

Update 19/Nov/17:  Also document the non-reserved keywords DEFAULT and POOL for 
3.0.0 in the DDL doc.

* [DDL -- Keywords | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords]


was (Author: le...@hortonworks.com):
Doc note:  This adds *hive.metastore.wm.default.pool.size* to HiveConf.java, so 
it needs to be documented in the wiki.  (Perhaps the LLAP section of 
Configuration Properties will have a subsection for workload management.)

* [Configuration Properties -- LLAP | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAP]

Added a TODOC3.0 label.

> add notions of default pool and start adding unmanaged mapping
> --
>
> Key: HIVE-17902
> URL: https://issues.apache.org/jira/browse/HIVE-17902
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17902.01.patch, HIVE-17902.02.patch, 
> HIVE-17902.03.patch, HIVE-17902.04.patch, HIVE-17902.05.patch, 
> HIVE-17902.06.patch, HIVE-17902.07.patch, HIVE-17902.08.patch, 
> HIVE-17902.09.patch, HIVE-17902.10.patch, HIVE-17902.patch
>
>
> This is needed to map queries between WM and non-WM execution



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17932) Remove option to control partition level basic stats fetching

2017-11-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258798#comment-16258798
 ] 

Lefty Leverenz commented on HIVE-17932:
---

Thanks Zoltan, I've removed the TODOC3.0 label.

> Remove option to control partition level basic stats fetching
> -
>
> Key: HIVE-17932
> URL: https://issues.apache.org/jira/browse/HIVE-17932
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 3.0.0
>
> Attachments: HIVE-17932.01.patch
>
>
> disabling the fetching of partition 
> stats({{hive.stats.fetch.partition.stats}}) may cause problematic cases to 
> arise for partitioned tables...the user might just want to disable the cbo 
> instead tweaking the fetching of partition stats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17965) Remove HIVELIMITTABLESCANPARTITION support

2017-11-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258797#comment-16258797
 ] 

Lefty Leverenz commented on HIVE-17965:
---

Thanks Zoltan, I've removed the TODOC3.0 label.

> Remove HIVELIMITTABLESCANPARTITION support
> --
>
> Key: HIVE-17965
> URL: https://issues.apache.org/jira/browse/HIVE-17965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17965.01.patch
>
>
> HIVE-13884 marked it as deprecated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17965) Remove HIVELIMITTABLESCANPARTITION support

2017-11-19 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-17965:
--
Labels:   (was: TODOC3.0)

> Remove HIVELIMITTABLESCANPARTITION support
> --
>
> Key: HIVE-17965
> URL: https://issues.apache.org/jira/browse/HIVE-17965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17965.01.patch
>
>
> HIVE-13884 marked it as deprecated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17932) Remove option to control partition level basic stats fetching

2017-11-19 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-17932:
--
Labels:   (was: TODOC3.0)

> Remove option to control partition level basic stats fetching
> -
>
> Key: HIVE-17932
> URL: https://issues.apache.org/jira/browse/HIVE-17932
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 3.0.0
>
> Attachments: HIVE-17932.01.patch
>
>
> disabling the fetching of partition 
> stats({{hive.stats.fetch.partition.stats}}) may cause problematic cases to 
> arise for partitioned tables...the user might just want to disable the cbo 
> instead tweaking the fetching of partition stats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17528) Add more q-tests for Hive-on-Spark with Parquet vectorized reader

2017-11-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258782#comment-16258782
 ] 

Hive QA commented on HIVE-17528:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12898408/HIVE-17528.5.patch

{color:green}SUCCESS:{color} +1 due to 30 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11443 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=78)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=103)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7915/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7915/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7915/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12898408 - PreCommit-HIVE-Build

> Add more q-tests for Hive-on-Spark with Parquet vectorized reader
> -
>
> Key: HIVE-17528
> URL: https://issues.apache.org/jira/browse/HIVE-17528
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Ferdinand Xu
> Attachments: HIVE-17528.1.patch, HIVE-17528.2.patch, 
> HIVE-17528.3.patch, HIVE-17528.4.patch, HIVE-17528.5.patch, HIVE-17528.patch
>
>
> Most of the vectorization related q-tests operate on ORC tables using Tez. It 
> would be good to add more coverage on a different combination of engine and 
> file-format. We can model existing q-tests using parquet tables and run it 
> using TestSparkCliDriver



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14495) Add SHOW MATERIALIZED VIEWS statement

2017-11-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258775#comment-16258775
 ] 

Lefty Leverenz commented on HIVE-14495:
---

Doc note:  This needs to be documented in the wiki.

* [DDL -- SHOW | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Show]

Added a TODOC3.0 label.

> Add SHOW MATERIALIZED VIEWS statement
> -
>
> Key: HIVE-14495
> URL: https://issues.apache.org/jira/browse/HIVE-14495
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-14495.01.patch, HIVE-14495.patch
>
>
> In the spirit of {{SHOW TABLES}}, we should support the following statement:
> {code:sql}
> SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards'];
> {code}
> In contrast to {{SHOW TABLES}}, this command would only list the materialized 
> views.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15018) ALTER rewriting flag in materialized view

2017-11-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258776#comment-16258776
 ] 

Lefty Leverenz commented on HIVE-15018:
---

Doc note:  This needs to be documented in the wiki.

* [DDL | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL]

Added a TODOC3.0 label.

> ALTER rewriting flag in materialized view 
> --
>
> Key: HIVE-15018
> URL: https://issues.apache.org/jira/browse/HIVE-15018
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-15018.01.patch, HIVE-15018.patch
>
>
> We should extend the ALTER statement in case we want to change the rewriting 
> behavior of the materialized view after we have created it.
> {code:sql}
> ALTER MATERIALIZED VIEW [db_name.]materialized_view_name DISABLE REWRITE;
> {code}
> {code:sql}
> ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE REWRITE;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-14495) Add SHOW MATERIALIZED VIEWS statement

2017-11-19 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14495:
--
Labels: TODOC3.0  (was: )

> Add SHOW MATERIALIZED VIEWS statement
> -
>
> Key: HIVE-14495
> URL: https://issues.apache.org/jira/browse/HIVE-14495
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-14495.01.patch, HIVE-14495.patch
>
>
> In the spirit of {{SHOW TABLES}}, we should support the following statement:
> {code:sql}
> SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards'];
> {code}
> In contrast to {{SHOW TABLES}}, this command would only list the materialized 
> views.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15018) ALTER rewriting flag in materialized view

2017-11-19 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15018:
--
Labels: TODOC3.0  (was: )

> ALTER rewriting flag in materialized view 
> --
>
> Key: HIVE-15018
> URL: https://issues.apache.org/jira/browse/HIVE-15018
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-15018.01.patch, HIVE-15018.patch
>
>
> We should extend the ALTER statement in case we want to change the rewriting 
> behavior of the materialized view after we have created it.
> {code:sql}
> ALTER MATERIALIZED VIEW [db_name.]materialized_view_name DISABLE REWRITE;
> {code}
> {code:sql}
> ALTER MATERIALIZED VIEW [db_name.]materialized_view_name ENABLE REWRITE;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16756) Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: / by zero"

2017-11-19 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16756:
---
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Patch merged to master and branch-2. Thanks for the review [~mmccline]

> Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: 
> / by zero"
> 
>
> Key: HIVE-16756
> URL: https://issues.apache.org/jira/browse/HIVE-16756
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16756.01.patch, HIVE-16756.02.patch, 
> HIVE-16756.03.patch, HIVE-16756.05-branch-2.patch, 
> HIVE-16756.06-branch-2.patch
>
>
> vectorization_div0.q needs to test the long data type testing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16756) Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: / by zero"

2017-11-19 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258764#comment-16258764
 ] 

Vihang Karajgaonkar commented on HIVE-16756:


{{vectorized_ptf}} is failing for a while on branch-2. Other failures are 
unrelated to this patch.

> Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: 
> / by zero"
> 
>
> Key: HIVE-16756
> URL: https://issues.apache.org/jira/browse/HIVE-16756
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Attachments: HIVE-16756.01.patch, HIVE-16756.02.patch, 
> HIVE-16756.03.patch, HIVE-16756.05-branch-2.patch, 
> HIVE-16756.06-branch-2.patch
>
>
> vectorization_div0.q needs to test the long data type testing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17964) HoS: some spark configs doesn't require re-creating a session

2017-11-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258760#comment-16258760
 ] 

Lefty Leverenz commented on HIVE-17964:
---

Doc note:  This adds *hive.spark.rsc.conf.list* to HiveConf.java, so it needs 
to be documented in the wiki.

* [Configuration Properties -- Spark | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Spark]

Added a TODOC3.0 label.

> HoS: some spark configs doesn't require re-creating a session
> -
>
> Key: HIVE-17964
> URL: https://issues.apache.org/jira/browse/HIVE-17964
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17964.1.patch, HIVE-17964.2.patch, 
> HIVE-17964.3.patch
>
>
> I guess the {{hive.spark.}} configs were initially intended for the RSC. 
> Therefore when they're changed, we'll re-create the session for them to take 
> effect. There're some configs not related to RSC that also start with 
> {{hive.spark.}}. We'd better rename them so that we don't unnecessarily 
> re-create sessions, which is usually time consuming.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14560) Support exchange partition between s3 and hdfs tables

2017-11-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258756#comment-16258756
 ] 

Lefty Leverenz commented on HIVE-14560:
---

Should this be documented in the wiki, or is it just a bug fix?

> Support exchange partition between s3 and hdfs tables
> -
>
> Key: HIVE-14560
> URL: https://issues.apache.org/jira/browse/HIVE-14560
> Project: Hive
>  Issue Type: Bug
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
> Fix For: 3.0.0
>
> Attachments: HIVE-14560.02.patch, HIVE-14560.patch
>
>
> {code}
> alter table s3_tbl exchange partition (country='USA', state='CA') with table 
> hdfs_tbl;
> {code}
> results in:
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got 
> exception: java.lang.IllegalArgumentException Wrong FS: 
> s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: 
> hdfs://localhost:9000) (state=08S01,code=1)
> {code}
> because the check for whether the s3 destination table path exists occurs on 
> the hdfs filesystem.
> Furthermore, exchanging between s3 to hdfs fails because the hdfs rename 
> operation is not supported across filesystems. Fix uses copy + deletion in 
> the case that the file systems differ.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18056) CachedStore: Have a whitelist/blacklist config to allow selective caching of tables/partitions and allow read while prewarming

2017-11-19 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258730#comment-16258730
 ] 

Lefty Leverenz commented on HIVE-18056:
---

Doc note:  This adds *hive.metastore.cached.rawstore.cached.object.whitelist* 
and *hive.metastore.cached.rawstore.cached.object.blacklist* to HiveConf.java, 
so they need to be documented in the wiki.

* [Configuration Properties -- Metastore | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore]

General documentation is also needed for CachedStore.

Added a TODOC3.0 label.

> CachedStore: Have a whitelist/blacklist config to allow selective caching of 
> tables/partitions and allow read while prewarming
> --
>
> Key: HIVE-18056
> URL: https://issues.apache.org/jira/browse/HIVE-18056
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Daniel Dai
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-18056.1.patch, HIVE-18056.2.patch, 
> HIVE-18056.3.patch, HIVE-18056.4.patch, HIVE-18056.5.patch, 
> HIVE-18056.6.patch, HIVE-18056.7.patch, HIVE-18056.8.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17528) Add more q-tests for Hive-on-Spark with Parquet vectorized reader

2017-11-19 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-17528:

Attachment: HIVE-17528.5.patch

Rebase to the latest code.

> Add more q-tests for Hive-on-Spark with Parquet vectorized reader
> -
>
> Key: HIVE-17528
> URL: https://issues.apache.org/jira/browse/HIVE-17528
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Ferdinand Xu
> Attachments: HIVE-17528.1.patch, HIVE-17528.2.patch, 
> HIVE-17528.3.patch, HIVE-17528.4.patch, HIVE-17528.5.patch, HIVE-17528.patch
>
>
> Most of the vectorization related q-tests operate on ORC tables using Tez. It 
> would be good to add more coverage on a different combination of engine and 
> file-format. We can model existing q-tests using parquet tables and run it 
> using TestSparkCliDriver



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18102) Hive insertion for complex types not working when "transactional=true" o ACID is enabled

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi updated HIVE-18102:
--
Summary: Hive insertion for complex types not working when 
"transactional=true" o ACID is enabled  (was: Hive insertion for complex types 
not working when "transactional=true")

> Hive insertion for complex types not working when "transactional=true" o ACID 
> is enabled
> 
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>Assignee: Hive QA
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:blue}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.*
> Example snippet:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> {code}
> hive> select * from default.struct_merge;
> OK
> {color:blue}1 
> [{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
> {color}
> Time taken: 0.125 seconds, Fetched: 1 row(s)
> *With transactional = true, behaviour is erratic, null values are populated 
> as values of nested Structs.*
> Eg:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> //this one gives null values
> {code}
> hive> select * from default.struct_merge1;
> OK
> {color:red}1  
> [{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
> {color}
> Time taken: 0.608 seconds, Fetched: 1 row(s)
> *Can this behaviour be explained? I need the transaction property since I am 
> merging into a common table on daily data.*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16756) Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: / by zero"

2017-11-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258641#comment-16258641
 ] 

Hive QA commented on HIVE-16756:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12898390/HIVE-16756.06-branch-2.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10657 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=153)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[merge_negative_5]
 (batchId=88)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7914/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7914/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7914/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12898390 - PreCommit-HIVE-Build

> Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: 
> / by zero"
> 
>
> Key: HIVE-16756
> URL: https://issues.apache.org/jira/browse/HIVE-16756
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Attachments: HIVE-16756.01.patch, HIVE-16756.02.patch, 
> HIVE-16756.03.patch, HIVE-16756.05-branch-2.patch, 
> HIVE-16756.06-branch-2.patch
>
>
> vectorization_div0.q needs to test the long data type testing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-18103) TestSparkCliDriver.testCliDriver[vectorized_ptf] failing on branch-2

2017-11-19 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-18103:
--


> TestSparkCliDriver.testCliDriver[vectorized_ptf] failing on branch-2
> 
>
> Key: HIVE-18103
> URL: https://issues.apache.org/jira/browse/HIVE-18103
> Project: Hive
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> TestSparkCliDriver.testCliDriver[vectorized_ptf.q] and 
> TestSparkCliDriver.testCliDriver[vectorization_7.q] are failing on branch-2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16756) Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: / by zero"

2017-11-19 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258611#comment-16258611
 ] 

Vihang Karajgaonkar commented on HIVE-16756:


some of the vectorization test failures are related. It looks like there are 
differences in the template file between master and branch-2 which is causing 
this. I regenerated the {{LongColModuloLongColumn.java}} from the template then 
applied the fix on top of it to fix these test failures. Some of the 
vectorization tests are showing diff failures even without patch. I will create 
a separate JIRA to fix them on branch-2


> Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: 
> / by zero"
> 
>
> Key: HIVE-16756
> URL: https://issues.apache.org/jira/browse/HIVE-16756
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Attachments: HIVE-16756.01.patch, HIVE-16756.02.patch, 
> HIVE-16756.03.patch, HIVE-16756.05-branch-2.patch, 
> HIVE-16756.06-branch-2.patch
>
>
> vectorization_div0.q needs to test the long data type testing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16756) Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: / by zero"

2017-11-19 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16756:
---
Attachment: HIVE-16756.06-branch-2.patch

> Vectorization: LongColModuloLongColumn throws "java.lang.ArithmeticException: 
> / by zero"
> 
>
> Key: HIVE-16756
> URL: https://issues.apache.org/jira/browse/HIVE-16756
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0
>Reporter: Matt McCline
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Attachments: HIVE-16756.01.patch, HIVE-16756.02.patch, 
> HIVE-16756.03.patch, HIVE-16756.05-branch-2.patch, 
> HIVE-16756.06-branch-2.patch
>
>
> vectorization_div0.q needs to test the long data type testing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18102:
--
Component/s: Transactions

> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>Assignee: Hive QA
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:blue}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.*
> Example snippet:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> {code}
> hive> select * from default.struct_merge;
> OK
> {color:blue}1 
> [{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
> {color}
> Time taken: 0.125 seconds, Fetched: 1 row(s)
> *With transactional = true, behaviour is erratic, null values are populated 
> as values of nested Structs.*
> Eg:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> //this one gives null values
> {code}
> hive> select * from default.struct_merge1;
> OK
> {color:red}1  
> [{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
> {color}
> Time taken: 0.608 seconds, Fetched: 1 row(s)
> *Can this behaviour be explained? I need the transaction property since I am 
> merging into a common table on daily data.*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi reassigned HIVE-18102:
-

Assignee: Hive QA  (was: Kiet Ly)

> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>Assignee: Hive QA
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:blue}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.*
> Example snippet:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> {code}
> hive> select * from default.struct_merge;
> OK
> {color:blue}1 
> [{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
> {color}
> Time taken: 0.125 seconds, Fetched: 1 row(s)
> *With transactional = true, behaviour is erratic, null values are populated 
> as values of nested Structs.*
> Eg:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> //this one gives null values
> {code}
> hive> select * from default.struct_merge1;
> OK
> {color:red}1  
> [{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
> {color}
> Time taken: 0.608 seconds, Fetched: 1 row(s)
> *Can this behaviour be explained? I need the transaction property since I am 
> merging into a common table on daily data.*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi reassigned HIVE-18102:
-

Assignee: Kiet Ly

> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>Assignee: Kiet Ly
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:blue}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.*
> Example snippet:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> {code}
> hive> select * from default.struct_merge;
> OK
> {color:blue}1 
> [{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
> {color}
> Time taken: 0.125 seconds, Fetched: 1 row(s)
> *With transactional = true, behaviour is erratic, null values are populated 
> as values of nested Structs.*
> Eg:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> //this one gives null values
> {code}
> hive> select * from default.struct_merge1;
> OK
> {color:red}1  
> [{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
> {color}
> Time taken: 0.608 seconds, Fetched: 1 row(s)
> *Can this behaviour be explained? I need the transaction property since I am 
> merging into a common table on daily data.*



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-13198) Authorization issues with cascading views

2017-11-19 Thread Wang Haihua (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-13198:
---
Description: 
dHere is a use case. They have a base table t1, from which they create a view 
v1. They further create a view v2 from v1 by applying a filter. User has access 
to only view v2, not view v1 or table t1. When user tries to access v2, they 
are denied access. 

Steps to recreate:
There is a base table t1 that exists in the default database with primary key 
id and some employee data (name, ssn etc)
Create view v1 - “create view v1 as select * from default.t1;”
Created v2 - “create view v2 as select * from v1 where id =1;”

Permissions provided for user to select all columns from view v2. When user 
runs select * from v2, hive throws an error “user does not have permissions to 
select view v1".

Apparently Hive is converting the query to underlying views.
SELECT * FROM v2 LIMIT 100
To
select `v1`.`id`, `v1`.`name`, `v1`.`ssn`, `v1`.`join_date`, `v1`.`location` 
from `hr`.`v1` where `v1`.`id`=1

Hive should only check for permissions for the view being run in the query, not 
any parent views. (This is consistent with ORACLE).

  was:
Here is a use case. They have a base table t1, from which they create a view 
v1. They further create a view v2 from v1 by applying a filter. User has access 
to only view v2, not view v1 or table t1. When user tries to access v2, they 
are denied access. 

Steps to recreate:
There is a base table t1 that exists in the default database with primary key 
id and some employee data (name, ssn etc)
Create view v1 - “create view v1 as select * from default.t1;”
Created v2 - “create view v2 as select * from v1 where id =1;”

Permissions provided for user to select all columns from view v2. When user 
runs select * from v2, hive throws an error “user does not have permissions to 
select view v1".

Apparently Hive is converting the query to underlying views.
SELECT * FROM v2 LIMIT 100
To
select `v1`.`id`, `v1`.`name`, `v1`.`ssn`, `v1`.`join_date`, `v1`.`location` 
from `hr`.`v1` where `v1`.`id`=1

Hive should only check for permissions for the view being run in the query, not 
any parent views. (This is consistent with ORACLE).


> Authorization issues with cascading views
> -
>
> Key: HIVE-13198
> URL: https://issues.apache.org/jira/browse/HIVE-13198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-13198.01.patch, HIVE-13198.02.patch
>
>
> dHere is a use case. They have a base table t1, from which they create a view 
> v1. They further create a view v2 from v1 by applying a filter. User has 
> access to only view v2, not view v1 or table t1. When user tries to access 
> v2, they are denied access. 
> Steps to recreate:
> There is a base table t1 that exists in the default database with primary key 
> id and some employee data (name, ssn etc)
> Create view v1 - “create view v1 as select * from default.t1;”
> Created v2 - “create view v2 as select * from v1 where id =1;”
> Permissions provided for user to select all columns from view v2. When user 
> runs select * from v2, hive throws an error “user does not have permissions 
> to select view v1".
> Apparently Hive is converting the query to underlying views.
> SELECT * FROM v2 LIMIT 100
> To
> select `v1`.`id`, `v1`.`name`, `v1`.`ssn`, `v1`.`join_date`, `v1`.`location` 
> from `hr`.`v1` where `v1`.`id`=1
> Hive should only check for permissions for the view being run in the query, 
> not any parent views. (This is consistent with ORACLE).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi updated HIVE-18102:
--
Description: 
I am merging into a table daily which has a column type as an array of structs :
{color:blue}segment_info ARRAY < STRUCT >
{color}

*When table is created without transactional=true, behaviour is fine.*
Example snippet:

{code:sql}

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

{code}
hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}
Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.*
Eg:
{code:sql}


drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values
{code}


hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}
Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*


  was:
I am merging into a table daily which has a column type as an array of structs :
{color:blue}segment_info ARRAY < STRUCT >
{color}

*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

{code:sql}

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

{code}
hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}
Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:
{code:sql}


drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values
{code}


hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}
Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*



> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:blue}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.*
> Example snippet:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
>

[jira] [Updated] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi updated HIVE-18102:
--
Description: 
I am merging into a table daily which has a column type as an array of structs :
{color:blue}segment_info ARRAY < STRUCT >
{color}

*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

{code:sql}

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

{code}
hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:
{code:sql}


drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values
{code}


hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*


  was:
I am merging into a table daily which has a column type as an array of structs :
{color:#205081}segment_info ARRAY < STRUCT >
{color}
*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

{code:sql}

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

{code}
hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values

hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*



> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:blue}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.
> *
> Example snippet:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
>

[jira] [Updated] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi updated HIVE-18102:
--
Description: 
I am merging into a table daily which has a column type as an array of structs :
{color:blue}segment_info ARRAY < STRUCT >
{color}

*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

{code:sql}

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

{code}
hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}
Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:
{code:sql}


drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values
{code}


hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}
Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*


  was:
I am merging into a table daily which has a column type as an array of structs :
{color:blue}segment_info ARRAY < STRUCT >
{color}

*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

{code:sql}

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

{code}
hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:
{code:sql}


drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values
{code}


hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*



> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:blue}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.
> *
> Example snippet:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
>

[jira] [Updated] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi updated HIVE-18102:
--
Description: 
I am merging into a table daily which has a column type as an array of structs :
{color:#205081}segment_info ARRAY < STRUCT >
{color}
*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

{code:sql}

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

{code}
hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values

hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*


  was:
I am merging into a table daily which has a column type as an array of structs :
{color:#205081}segment_info ARRAY < STRUCT >
{color}
*When table is created without transactional=true, behaviour is fine.
*
Example snippet:
??author??
drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values

hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*



> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:#205081}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.
> *
> Example snippet:
> {code:sql}
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
>

[jira] [Updated] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi updated HIVE-18102:
--
Description: 
I am merging into a table daily which has a column type as an array of structs :
{color:#205081}segment_info ARRAY < STRUCT >
{color}
*When table is created without transactional=true, behaviour is fine.
*
Example snippet:
??author??
drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values

hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*


  was:
I am merging into a table daily which has a column type as an array of structs :
segment_info ARRAY < STRUCT >

*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values

hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*



> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> {color:#205081}segment_info ARRAY < STRUCT  idlpSegmentValue: STRING >>
> {color}
> *When table is created without transactional=true, behaviour is fine.
> *
> Example snippet:
> ??author??
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select *

[jira] [Updated] (HIVE-18102) Hive insertion for complex types not working when "transactional=true"

2017-11-19 Thread Nillohit Nandi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nillohit Nandi updated HIVE-18102:
--
Description: 
I am merging into a table daily which has a column type as an array of structs :
segment_info ARRAY < STRUCT >

*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

hive> select * from default.struct_merge;
OK
{color:blue}1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
{color}Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values

hive> select * from default.struct_merge1;
OK
{color:red}1
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
{color}Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*


  was:
I am merging into a table daily which has a column type as an array of structs :
segment_info ARRAY < STRUCT >

*When table is created without transactional=true, behaviour is fine.
*
Example snippet:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC;

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;

hive> select * from default.struct_merge;
OK
1   
[{"idlpSegmentName":"viant","idlpSegmentValue":"z"},{"idlpSegmentName":"instyle","idlpSegmentValue":"3"}]
Time taken: 0.125 seconds, Fetched: 1 row(s)


*With transactional = true, behaviour is erratic, null values are populated as 
values of nested Structs.
*
Eg:

drop table struct_merge;

CREATE TABLE struct_merge (
lr_id STRING,
segment_info ARRAY < STRUCT >
)
CLUSTERED BY(lr_id)
INTO 1 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

INSERT INTO TABLE struct_merge 
   Select 1 AS lr_id , 
   
ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
AS segment_info; 

select * from struct_merge;
//this one gives null values

hive> select * from default.struct_merge1;
OK
1   
[{"idlpSegmentName":null,"idlpSegmentValue":null},{"idlpSegmentName":null,"idlpSegmentValue":null}]
Time taken: 0.608 seconds, Fetched: 1 row(s)


*Can this behaviour be explained? I need the transaction property since I am 
merging into a common table on daily data.*



> Hive insertion for complex types not working when "transactional=true"
> --
>
> Key: HIVE-18102
> URL: https://issues.apache.org/jira/browse/HIVE-18102
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.1
> Environment: Running EMR cluster on AWS, with :
> Master: Running1m3.xlarge
> Core: Running4m3.xlarge
>Reporter: Nillohit Nandi
>
> I am merging into a table daily which has a column type as an array of 
> structs :
> segment_info ARRAY < STRUCT  STRING >>
> *When table is created without transactional=true, behaviour is fine.
> *
> Example snippet:
> drop table struct_merge;
> CREATE TABLE struct_merge (
> lr_id STRING,
> segment_info ARRAY < STRUCT  STRING >>
> )
> CLUSTERED BY(lr_id)
> INTO 1 BUCKETS
> STORED AS ORC;
> INSERT INTO TABLE struct_merge 
>Select 1 AS lr_id , 
>
> ARRAY(NAMED_STRUCT('idlpSegmentName','viant','idlpSegmentValue','z'), 
> NAMED_STRUCT('idlpSegmentName','instyle','idlpSegmentValue','3')) 
> AS segment_info; 
> select * from struct_merge;
> hive> select * from default.struct_merge;
> OK
> {color:blue}1 
>

44 matches

Mail list logo