[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2015-04-25 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---
Attachment: HIVE-7805.2.patch

Finally got a chance to rebase against latest trunk. I'll update the review as 
well.

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
Assignee: Andrew Mains
 Attachments: HIVE-7805.1.patch, HIVE-7805.2.patch, HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2015-04-25 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---
Attachment: HIVE-7805.2.patch

 Support running multiple scans in hbase-handler
 ---

 Key: HIVE-7805
 URL: https://issues.apache.org/jira/browse/HIVE-7805
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Andrew Mains
Assignee: Andrew Mains
 Attachments: HIVE-7805.1.patch, HIVE-7805.2.patch, HIVE-7805.patch


 Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
 This can be less efficient than running multiple disjoint scans in certain 
 cases, particularly when using a composite row key. For instance, given a row 
 key schema of:
 {code}
 structbucket int, time timestamp
 {code}
 if one wants to push down the predicate:
 {code}
 bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp  1408506670
 {code}
 it's much more efficient to run a scan for each bucket over the time range 
 (particularly if there's a large amount of data per day). With a single scan, 
 the MR job has to process the data for all time for buckets in between 1 and 
 100.
 hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
 scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3404) Create quarter UDF

2015-04-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512704#comment-14512704
 ] 

Hive QA commented on HIVE-3404:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728168/HIVE-3404.3.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8818 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3584/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3584/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3584/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728168 - PreCommit-HIVE-TRUNK-Build

 Create quarter UDF
 --

 Key: HIVE-3404
 URL: https://issues.apache.org/jira/browse/HIVE-3404
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Sanam Naz
Assignee: Alexander Pivovarov
 Attachments: HIVE-3404.1.patch.txt, HIVE-3404.2.patch, 
 HIVE-3404.2.patch, HIVE-3404.3.patch


 The function QUARTER(date) would return the quarter  from a string / date / 
 timestamp. This will be useful for different domains like retail ,finance etc.
 MySQL has QUARTER function
 https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_quarter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10473) Spark client is recreated even spark configuration is not changed

2015-04-25 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512601#comment-14512601
 ] 

Jimmy Xiang commented on HIVE-10473:


If the new value is null, the set will fail at Configuration#set(String name, 
String value, String source), which checks to make sure neither name or value 
to be null.

 Spark client is recreated even spark configuration is not changed
 -

 Key: HIVE-10473
 URL: https://issues.apache.org/jira/browse/HIVE-10473
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: HIVE-10473.1-spark.patch, HIVE-10473.1.patch


 Currently, we think a spark setting is changed as long as the set method is 
 called, even we set it to the same value as before. We should check if the 
 value is changed too, since it takes time to start a new spark client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10455) CBO (Calcite Return Path): Different data types at Reducer before JoinOp

2015-04-25 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512819#comment-14512819
 ] 

Pengcheng Xiong commented on HIVE-10455:


[~jcamachorodriguez], I agree with you and I uploaded a new patch. I assume 
that it can pass all the cbo tests after Hive-10416 and 10479. Could you take 
another look? Thanks.

 CBO (Calcite Return Path): Different data types at Reducer before JoinOp
 

 Key: HIVE-10455
 URL: https://issues.apache.org/jira/browse/HIVE-10455
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 1.2.0

 Attachments: HIVE-10455.01.patch, HIVE-10455.02.patch


 The following error occured for cbo_subq_not_in.q 
 {code}
 java.lang.Exception: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable 
 to deserialize reduce input key from x1x128x0x0x1 with properties 
 {columns=reducesinkkey0, 
 serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
  serialization.sort.order=+, columns.types=double}
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
 {code}
 A more easier way to reproduce is 
 {code}
 set hive.cbo.enable=true;
 set hive.exec.check.crossproducts=false;
 set hive.stats.fetch.column.stats=true;
 set hive.auto.convert.join=false;
 select p_size, src.key
 from 
 part join src
 on p_size=key;
 {code}
 As you can see, p_size is integer while src.key is string. Both of them 
 should be cast to double when they join. When return path is off, this will 
 happen before Join, at RS. However, when return path is on, this will be 
 considered as an expression in Join. Thus, when reducer is collecting 
 different types of keys from different join branches, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3404) Create quarter UDF

2015-04-25 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-3404:
--
Attachment: HIVE-3404.3.patch

patch #3
- add VOID_GROUP to checkArgGroups
- add null void type test

 Create quarter UDF
 --

 Key: HIVE-3404
 URL: https://issues.apache.org/jira/browse/HIVE-3404
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Sanam Naz
Assignee: Alexander Pivovarov
 Attachments: HIVE-3404.1.patch.txt, HIVE-3404.2.patch, 
 HIVE-3404.2.patch, HIVE-3404.3.patch


 The function QUARTER(date) would return the quarter  from a string / date / 
 timestamp. This will be useful for different domains like retail ,finance etc.
 MySQL has QUARTER function
 https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_quarter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10455) CBO (Calcite Return Path): Different data types at Reducer before JoinOp

2015-04-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512853#comment-14512853
 ] 

Hive QA commented on HIVE-10455:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728191/HIVE-10455.02.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3588/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3588/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3588/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3588/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-git-master-source ]]
+ [[ ! -d apache-git-master-source/.git ]]
+ [[ ! -d apache-git-master-source ]]
+ cd apache-git-master-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 123bb8e Preparing for 1.3.0 development
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 123bb8e Preparing for 1.3.0 development
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728191 - PreCommit-HIVE-TRUNK-Build

 CBO (Calcite Return Path): Different data types at Reducer before JoinOp
 

 Key: HIVE-10455
 URL: https://issues.apache.org/jira/browse/HIVE-10455
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 1.2.0

 Attachments: HIVE-10455.01.patch, HIVE-10455.02.patch


 The following error occured for cbo_subq_not_in.q 
 {code}
 java.lang.Exception: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable 
 to deserialize reduce input key from x1x128x0x0x1 with properties 
 {columns=reducesinkkey0, 
 serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
  serialization.sort.order=+, columns.types=double}
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
 {code}
 A more easier way to reproduce is 
 {code}
 set hive.cbo.enable=true;
 set hive.exec.check.crossproducts=false;
 set hive.stats.fetch.column.stats=true;
 set hive.auto.convert.join=false;
 select p_size, src.key
 from 
 part join src
 on p_size=key;
 {code}
 As you can see, p_size is integer while src.key is string. Both of them 
 should be cast to double when they join. When return path is off, this will 
 happen before Join, at RS. However, when return path is on, this will be 
 considered as an expression in Join. Thus, when reducer is collecting 
 different types of keys from different join branches, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10486) Update wiki for switch from svn to git

2015-04-25 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512860#comment-14512860
 ] 

Lefty Leverenz commented on HIVE-10486:
---

These wikidocs need to be revised:

#  [Developer Guide | 
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide] -- 2 instances 
of svn
#  [Hive Developer FAQ | 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ] -- 3 
instances of svn
#  [How To Contribute | 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute] -- 17 
instances of svn
#  [How To Commit | 
https://cwiki.apache.org/confluence/display/Hive/HowToCommit] -- 12 instances 
of svn (first one is an obsolete URL for credits.xml, whole paragraph needs 
revision; Committing Documentation also needs complete revision now that docs 
are in the wiki)
#  [How To Release | 
https://cwiki.apache.org/confluence/display/Hive/HowToRelease] -- 22 instances 
of svn
#  [How to edit the website | 
https://cwiki.apache.org/confluence/display/Hive/How+to+edit+the+website] -- 2 
instances of svn
#  [Hive PreCommit Patch Testing | 
https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing] 
-- 1 instance of svn
#  [Jenkins Script | 
https://cwiki.apache.org/confluence/display/Hive/Jenkins+Script] -- 2 instances 
of svn
#  [Getting Started | 
https://cwiki.apache.org/confluence/display/Hive/GettingStarted] -- 7 instances 
of svn
#  [Admin Manual Installation | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation] -- 4 
instances of svn
#  [Hive Web Interface | 
https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface] -- 1 
instance of svn
#  [Generic UDAF Case Study | 
https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy] -- 4 
instances of svn
#  [WebHCat Configure | 
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure] -- 4 
instances of svn


 Update wiki for switch from svn to git
 --

 Key: HIVE-10486
 URL: https://issues.apache.org/jira/browse/HIVE-10486
 Project: Hive
  Issue Type: Bug
Reporter: Lefty Leverenz

 The Hive wiki has many svn instructions that need to be changed to their git 
 equivalents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10485) Create md5 UDF

2015-04-25 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10485:
---
Description: 
MD5(str)
Calculates an MD5 128-bit checksum for the string. The value is returned as a 
string of 32 hex digits, or NULL if the argument was NULL. The return value 
can, for example, be used as a hash key.
Example:
{code}
SELECT MD5('udf_md5');
'ce62ef0d2d27dc37b6d488b92f4b24fd'
{code}

online md5 generator: http://www.md5.cz/

 Create md5 UDF
 --

 Key: HIVE-10485
 URL: https://issues.apache.org/jira/browse/HIVE-10485
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 MD5(str)
 Calculates an MD5 128-bit checksum for the string. The value is returned as a 
 string of 32 hex digits, or NULL if the argument was NULL. The return value 
 can, for example, be used as a hash key.
 Example:
 {code}
 SELECT MD5('udf_md5');
 'ce62ef0d2d27dc37b6d488b92f4b24fd'
 {code}
 online md5 generator: http://www.md5.cz/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10485) Create md5 UDF

2015-04-25 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10485:
---
Description: 
MD5(str)
Calculates an MD5 128-bit checksum for the string. The value is returned as a 
string of 32 hex digits, or NULL if the argument was NULL. The return value 
can, for example, be used as a hash key.
Example:
{code}
SELECT MD5('udf_md5');
'ce62ef0d2d27dc37b6d488b92f4b24fd'
{code}

online md5 generator: http://www.md5.cz/

MySQL has md5 function: 
https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
PostgreSQL also has md5 function: 
http://www.postgresql.org/docs/9.1/static/functions-string.html

  was:
MD5(str)
Calculates an MD5 128-bit checksum for the string. The value is returned as a 
string of 32 hex digits, or NULL if the argument was NULL. The return value 
can, for example, be used as a hash key.
Example:
{code}
SELECT MD5('udf_md5');
'ce62ef0d2d27dc37b6d488b92f4b24fd'
{code}

online md5 generator: http://www.md5.cz/


 Create md5 UDF
 --

 Key: HIVE-10485
 URL: https://issues.apache.org/jira/browse/HIVE-10485
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 MD5(str)
 Calculates an MD5 128-bit checksum for the string. The value is returned as a 
 string of 32 hex digits, or NULL if the argument was NULL. The return value 
 can, for example, be used as a hash key.
 Example:
 {code}
 SELECT MD5('udf_md5');
 'ce62ef0d2d27dc37b6d488b92f4b24fd'
 {code}
 online md5 generator: http://www.md5.cz/
 MySQL has md5 function: 
 https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
 PostgreSQL also has md5 function: 
 http://www.postgresql.org/docs/9.1/static/functions-string.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10485) Create md5 UDF

2015-04-25 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10485:
---
Description: 
MD5(str)
Calculates an MD5 128-bit checksum for UTF-8 string. The value is returned as a 
string of 32 hex digits, or NULL if the argument was NULL. The return value 
can, for example, be used as a hash key.
Example:
{code}
SELECT MD5('udf_md5');
'ce62ef0d2d27dc37b6d488b92f4b24fd'
{code}

online md5 generator: http://www.md5.cz/

MySQL has md5 function: 
https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
PostgreSQL also has md5 function: 
http://www.postgresql.org/docs/9.1/static/functions-string.html

  was:
MD5(str)
Calculates an MD5 128-bit checksum for the string. The value is returned as a 
string of 32 hex digits, or NULL if the argument was NULL. The return value 
can, for example, be used as a hash key.
Example:
{code}
SELECT MD5('udf_md5');
'ce62ef0d2d27dc37b6d488b92f4b24fd'
{code}

online md5 generator: http://www.md5.cz/

MySQL has md5 function: 
https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
PostgreSQL also has md5 function: 
http://www.postgresql.org/docs/9.1/static/functions-string.html


 Create md5 UDF
 --

 Key: HIVE-10485
 URL: https://issues.apache.org/jira/browse/HIVE-10485
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 MD5(str)
 Calculates an MD5 128-bit checksum for UTF-8 string. The value is returned as 
 a string of 32 hex digits, or NULL if the argument was NULL. The return value 
 can, for example, be used as a hash key.
 Example:
 {code}
 SELECT MD5('udf_md5');
 'ce62ef0d2d27dc37b6d488b92f4b24fd'
 {code}
 online md5 generator: http://www.md5.cz/
 MySQL has md5 function: 
 https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
 PostgreSQL also has md5 function: 
 http://www.postgresql.org/docs/9.1/static/functions-string.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10477) Provide option to disable Spark tests

2015-04-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10477:
-
Attachment: HIVE-10477.01.patch

 Provide option to disable Spark tests 
 --

 Key: HIVE-10477
 URL: https://issues.apache.org/jira/browse/HIVE-10477
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10477.01.patch


 The following is one of the reasons why we might want to provide an option to 
 disable spark tests :
 In the current master branch, unit tests fail with windows OS because of the 
 dependency on bash executable in itests/hive-unit/pom.xml around these 
 lines :
 {code}
  target
 exec executable=bash dir=${basedir} 
 failonerror=true
   arg line=../target/download.sh/
 /exec
   /target
 {code}
 We should provide an option to disable spark tests in OSes  like Windows 
 where bash might be absent. That being mentioned, spark tests will be enabled 
 by default in pre-commit test runs and should still continue to work as it is 
 in the master branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9645) Constant folding case NULL equality

2015-04-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512357#comment-14512357
 ] 

Gopal V commented on HIVE-9645:
---

[~apivovarov]: the whole VOID handling cases where only done partly since the 
newest patch generates ((int)null) instead of void(null) for column types.

 Constant folding case NULL equality
 ---

 Key: HIVE-9645
 URL: https://issues.apache.org/jira/browse/HIVE-9645
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Gopal V
Assignee: Ashutosh Chauhan
 Fix For: 1.2.0

 Attachments: HIVE-9645.1.patch, HIVE-9645.2.patch, HIVE-9645.3.patch, 
 HIVE-9645.4.patch, HIVE-9645.5.patch, HIVE-9645.6.patch, HIVE-9645.7.patch, 
 HIVE-9645.patch


 Hive logical optimizer does not follow the Null scan codepath when 
 encountering a NULL = 1;
 NULL = 1 is not evaluated as false in the constant propogation implementation.
 {code}
 hive explain select count(1) from store_sales where null=1;
 ...
  TableScan
   alias: store_sales
   filterExpr: (null = 1) (type: boolean)
   Statistics: Num rows: 550076554 Data size: 49570324480 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (null = 1) (type: boolean)
 Statistics: Num rows: 275038277 Data size: 0 Basic stats: 
 PARTIAL Column stats: COMPLETE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9645) Constant folding case NULL equality

2015-04-25 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512312#comment-14512312
 ] 

Alexander Pivovarov commented on HIVE-9645:
---

Why VOID added to obtainIntConverter but not to obtainLongConverter in 
GenericUDF?

same question for obtainDateConverter and obtainTimestampConverter

 Constant folding case NULL equality
 ---

 Key: HIVE-9645
 URL: https://issues.apache.org/jira/browse/HIVE-9645
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Gopal V
Assignee: Ashutosh Chauhan
 Fix For: 1.2.0

 Attachments: HIVE-9645.1.patch, HIVE-9645.2.patch, HIVE-9645.3.patch, 
 HIVE-9645.4.patch, HIVE-9645.5.patch, HIVE-9645.6.patch, HIVE-9645.7.patch, 
 HIVE-9645.patch


 Hive logical optimizer does not follow the Null scan codepath when 
 encountering a NULL = 1;
 NULL = 1 is not evaluated as false in the constant propogation implementation.
 {code}
 hive explain select count(1) from store_sales where null=1;
 ...
  TableScan
   alias: store_sales
   filterExpr: (null = 1) (type: boolean)
   Statistics: Num rows: 550076554 Data size: 49570324480 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (null = 1) (type: boolean)
 Statistics: Num rows: 275038277 Data size: 0 Basic stats: 
 PARTIAL Column stats: COMPLETE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10485) Create md5 UDF

2015-04-25 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10485:
---
Attachment: HIVE-10485.1.patch

patch #1

 Create md5 UDF
 --

 Key: HIVE-10485
 URL: https://issues.apache.org/jira/browse/HIVE-10485
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10485.1.patch


 MD5(str)
 Calculates an MD5 128-bit checksum for the string. The value is returned as a 
 string of 32 hex digits, or NULL if the argument was NULL. The return value 
 can, for example, be used as a hash key.
 Example:
 {code}
 SELECT MD5('udf_md5');
 'ce62ef0d2d27dc37b6d488b92f4b24fd'
 {code}
 online md5 generator: http://www.md5.cz/
 MySQL has md5 function: 
 https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5
 PostgreSQL also has md5 function: 
 http://www.postgresql.org/docs/9.1/static/functions-string.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6774) Not a valid JAR errors from TestExecDriver

2015-04-25 Thread Jan Morlock (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512368#comment-14512368
 ] 

Jan Morlock commented on HIVE-6774:
---

I consider this fact as very annoying. Every Hive newcomer reading the 
GettingStarted guide and executing the instructions written there, will get 
into this frustrating situation. See for example

http://stackoverflow.com/questions/25353207/hive-testmapplan1org-apache-hadoop-hive-ql-exec-testexecdriver-failed

 Not a valid JAR errors from TestExecDriver
 

 Key: HIVE-6774
 URL: https://issues.apache.org/jira/browse/HIVE-6774
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere

 If I wipe out my local Maven repository and run the command:
 mvn clean install -Dtest=TestExecDriver -Phadoop-1
 All of the TestExecDriver tests fail with the following errors:
 {noformat}
 Not a valid JAR: 
 /Users/jdere/.m2/repository/org/apache/hive/hive-exec/0.14.0-SNAPSHOT/hive-exec-0.14.0-SNAPSHOT.jar
 Execution failed with exit status: 255
 Obtaining error information
 Task failed!
 Task ID:
   null
 Logs:
 /Users/jdere/dev/hive.git/ql/target/tmp/log/hive.log
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.session.SessionState.addLocalMapRedErrors(SessionState.java:919)
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:282)
 at 
 org.apache.hadoop.hive.ql.exec.TestExecDriver.executePlan(TestExecDriver.java:460)
 at 
 org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1(TestExecDriver.java:474)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:243)
 at junit.framework.TestSuite.run(TestSuite.java:238)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.

2015-04-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5202:

Issue Type: Improvement  (was: Bug)

 Support for SettableUnionObjectInspector and implement 
 isSettable/hasAllFieldsSettable APIs for all data types.
 ---

 Key: HIVE-5202
 URL: https://issues.apache.org/jira/browse/HIVE-5202
 Project: Hive
  Issue Type: Improvement
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Fix For: 0.13.0

 Attachments: HIVE-5202.2.patch.txt, HIVE-5202.patch


 These 3 tasks should be accomplished as part of the following jira:
 1. The current implementation lacks settable union object inspector. We can 
 run into exception inside ObjectInspectorConverters.getConvertedOI() if there 
 is a union.
 2. Implement the following public functions for all datatypes: 
 isSettable()- Perform shallow check to see if an object inspector is 
 inherited from settableOI type and 
 hasAllFieldsSettable() - Perform deep check to see if this objectInspector 
 and all the underlying object inspectors are inherited from settableOI type.
 3. ObjectInspectorConverters.getConvertedOI() is inefficient. Once (1) and 
 (2) are implemented, add the following check: outputOI.hasAllSettableFields() 
 should be added to return outputOI immediately if the object is entirely 
 settable in order to prevent redundant object instantiation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3404) Create quarter UDF

2015-04-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512314#comment-14512314
 ] 

Hive QA commented on HIVE-3404:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728128/HIVE-3404.2.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8817 tests 
executed
*Failed tests:*
{noformat}
TestDummy - did not produce a TEST-*.xml file
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3582/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3582/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3582/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728128 - PreCommit-HIVE-TRUNK-Build

 Create quarter UDF
 --

 Key: HIVE-3404
 URL: https://issues.apache.org/jira/browse/HIVE-3404
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Sanam Naz
Assignee: Alexander Pivovarov
 Attachments: HIVE-3404.1.patch.txt, HIVE-3404.2.patch, 
 HIVE-3404.2.patch


 The function QUARTER(date) would return the quarter  from a string / date / 
 timestamp. This will be useful for different domains like retail ,finance etc.
 MySQL has QUARTER function
 https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_quarter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()

2015-04-25 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512568#comment-14512568
 ] 

Gabor Liptak commented on HIVE-7150:


I uploaded an upodated patch (but the QA build didn't run ...)

 FileInputStream is not closed in HiveConnection#getHttpClient()
 ---

 Key: HIVE-7150
 URL: https://issues.apache.org/jira/browse/HIVE-7150
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
  Labels: jdbc
 Fix For: 1.2.0

 Attachments: HIVE-7150.1.patch


 Here is related code:
 {code}
 sslTrustStore.load(new FileInputStream(sslTrustStorePath),
 sslTrustStorePassword.toCharArray());
 {code}
 The FileInputStream is not closed upon returning from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)