[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler
[ https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Mains updated HIVE-7805: --- Attachment: HIVE-7805.2.patch Finally got a chance to rebase against latest trunk. I'll update the review as well. Support running multiple scans in hbase-handler --- Key: HIVE-7805 URL: https://issues.apache.org/jira/browse/HIVE-7805 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Andrew Mains Assignee: Andrew Mains Attachments: HIVE-7805.1.patch, HIVE-7805.2.patch, HIVE-7805.patch Currently, the HiveHBaseTableInputFormat only supports running a single scan. This can be less efficient than running multiple disjoint scans in certain cases, particularly when using a composite row key. For instance, given a row key schema of: {code} structbucket int, time timestamp {code} if one wants to push down the predicate: {code} bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp 1408506670 {code} it's much more efficient to run a scan for each bucket over the time range (particularly if there's a large amount of data per day). With a single scan, the MR job has to process the data for all time for buckets in between 1 and 100. hive should allow HBaseKeyFactory's to decompose a predicate into one or more scans in order to take advantage of this fact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler
[ https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Mains updated HIVE-7805: --- Attachment: HIVE-7805.2.patch Support running multiple scans in hbase-handler --- Key: HIVE-7805 URL: https://issues.apache.org/jira/browse/HIVE-7805 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Andrew Mains Assignee: Andrew Mains Attachments: HIVE-7805.1.patch, HIVE-7805.2.patch, HIVE-7805.patch Currently, the HiveHBaseTableInputFormat only supports running a single scan. This can be less efficient than running multiple disjoint scans in certain cases, particularly when using a composite row key. For instance, given a row key schema of: {code} structbucket int, time timestamp {code} if one wants to push down the predicate: {code} bucket IN (1, 10, 100) AND timestamp = 1408333927 AND timestamp 1408506670 {code} it's much more efficient to run a scan for each bucket over the time range (particularly if there's a large amount of data per day). With a single scan, the MR job has to process the data for all time for buckets in between 1 and 100. hive should allow HBaseKeyFactory's to decompose a predicate into one or more scans in order to take advantage of this fact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3404) Create quarter UDF
[ https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512704#comment-14512704 ] Hive QA commented on HIVE-3404: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12728168/HIVE-3404.3.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8818 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3584/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3584/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3584/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12728168 - PreCommit-HIVE-TRUNK-Build Create quarter UDF -- Key: HIVE-3404 URL: https://issues.apache.org/jira/browse/HIVE-3404 Project: Hive Issue Type: New Feature Components: UDF Reporter: Sanam Naz Assignee: Alexander Pivovarov Attachments: HIVE-3404.1.patch.txt, HIVE-3404.2.patch, HIVE-3404.2.patch, HIVE-3404.3.patch The function QUARTER(date) would return the quarter from a string / date / timestamp. This will be useful for different domains like retail ,finance etc. MySQL has QUARTER function https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_quarter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10473) Spark client is recreated even spark configuration is not changed
[ https://issues.apache.org/jira/browse/HIVE-10473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512601#comment-14512601 ] Jimmy Xiang commented on HIVE-10473: If the new value is null, the set will fail at Configuration#set(String name, String value, String source), which checks to make sure neither name or value to be null. Spark client is recreated even spark configuration is not changed - Key: HIVE-10473 URL: https://issues.apache.org/jira/browse/HIVE-10473 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: HIVE-10473.1-spark.patch, HIVE-10473.1.patch Currently, we think a spark setting is changed as long as the set method is called, even we set it to the same value as before. We should check if the value is changed too, since it takes time to start a new spark client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10455) CBO (Calcite Return Path): Different data types at Reducer before JoinOp
[ https://issues.apache.org/jira/browse/HIVE-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512819#comment-14512819 ] Pengcheng Xiong commented on HIVE-10455: [~jcamachorodriguez], I agree with you and I uploaded a new patch. I assume that it can pass all the cbo tests after Hive-10416 and 10479. Could you take another look? Thanks. CBO (Calcite Return Path): Different data types at Reducer before JoinOp Key: HIVE-10455 URL: https://issues.apache.org/jira/browse/HIVE-10455 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 1.2.0 Attachments: HIVE-10455.01.patch, HIVE-10455.02.patch The following error occured for cbo_subq_not_in.q {code} java.lang.Exception: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0x1 with properties {columns=reducesinkkey0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=double} at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) {code} A more easier way to reproduce is {code} set hive.cbo.enable=true; set hive.exec.check.crossproducts=false; set hive.stats.fetch.column.stats=true; set hive.auto.convert.join=false; select p_size, src.key from part join src on p_size=key; {code} As you can see, p_size is integer while src.key is string. Both of them should be cast to double when they join. When return path is off, this will happen before Join, at RS. However, when return path is on, this will be considered as an expression in Join. Thus, when reducer is collecting different types of keys from different join branches, it throws exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3404) Create quarter UDF
[ https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-3404: -- Attachment: HIVE-3404.3.patch patch #3 - add VOID_GROUP to checkArgGroups - add null void type test Create quarter UDF -- Key: HIVE-3404 URL: https://issues.apache.org/jira/browse/HIVE-3404 Project: Hive Issue Type: New Feature Components: UDF Reporter: Sanam Naz Assignee: Alexander Pivovarov Attachments: HIVE-3404.1.patch.txt, HIVE-3404.2.patch, HIVE-3404.2.patch, HIVE-3404.3.patch The function QUARTER(date) would return the quarter from a string / date / timestamp. This will be useful for different domains like retail ,finance etc. MySQL has QUARTER function https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_quarter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10455) CBO (Calcite Return Path): Different data types at Reducer before JoinOp
[ https://issues.apache.org/jira/browse/HIVE-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512853#comment-14512853 ] Hive QA commented on HIVE-10455: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12728191/HIVE-10455.02.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3588/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3588/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3588/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3588/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-git-master-source ]] + [[ ! -d apache-git-master-source/.git ]] + [[ ! -d apache-git-master-source ]] + cd apache-git-master-source + git fetch origin + git reset --hard HEAD HEAD is now at 123bb8e Preparing for 1.3.0 development + git clean -f -d + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 123bb8e Preparing for 1.3.0 development + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12728191 - PreCommit-HIVE-TRUNK-Build CBO (Calcite Return Path): Different data types at Reducer before JoinOp Key: HIVE-10455 URL: https://issues.apache.org/jira/browse/HIVE-10455 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 1.2.0 Attachments: HIVE-10455.01.patch, HIVE-10455.02.patch The following error occured for cbo_subq_not_in.q {code} java.lang.Exception: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0x1 with properties {columns=reducesinkkey0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=double} at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) {code} A more easier way to reproduce is {code} set hive.cbo.enable=true; set hive.exec.check.crossproducts=false; set hive.stats.fetch.column.stats=true; set hive.auto.convert.join=false; select p_size, src.key from part join src on p_size=key; {code} As you can see, p_size is integer while src.key is string. Both of them should be cast to double when they join. When return path is off, this will happen before Join, at RS. However, when return path is on, this will be considered as an expression in Join. Thus, when reducer is collecting different types of keys from different join branches, it throws exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10486) Update wiki for switch from svn to git
[ https://issues.apache.org/jira/browse/HIVE-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512860#comment-14512860 ] Lefty Leverenz commented on HIVE-10486: --- These wikidocs need to be revised: # [Developer Guide | https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide] -- 2 instances of svn # [Hive Developer FAQ | https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ] -- 3 instances of svn # [How To Contribute | https://cwiki.apache.org/confluence/display/Hive/HowToContribute] -- 17 instances of svn # [How To Commit | https://cwiki.apache.org/confluence/display/Hive/HowToCommit] -- 12 instances of svn (first one is an obsolete URL for credits.xml, whole paragraph needs revision; Committing Documentation also needs complete revision now that docs are in the wiki) # [How To Release | https://cwiki.apache.org/confluence/display/Hive/HowToRelease] -- 22 instances of svn # [How to edit the website | https://cwiki.apache.org/confluence/display/Hive/How+to+edit+the+website] -- 2 instances of svn # [Hive PreCommit Patch Testing | https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing] -- 1 instance of svn # [Jenkins Script | https://cwiki.apache.org/confluence/display/Hive/Jenkins+Script] -- 2 instances of svn # [Getting Started | https://cwiki.apache.org/confluence/display/Hive/GettingStarted] -- 7 instances of svn # [Admin Manual Installation | https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation] -- 4 instances of svn # [Hive Web Interface | https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface] -- 1 instance of svn # [Generic UDAF Case Study | https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy] -- 4 instances of svn # [WebHCat Configure | https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure] -- 4 instances of svn Update wiki for switch from svn to git -- Key: HIVE-10486 URL: https://issues.apache.org/jira/browse/HIVE-10486 Project: Hive Issue Type: Bug Reporter: Lefty Leverenz The Hive wiki has many svn instructions that need to be changed to their git equivalents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10485) Create md5 UDF
[ https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10485: --- Description: MD5(str) Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ Create md5 UDF -- Key: HIVE-10485 URL: https://issues.apache.org/jira/browse/HIVE-10485 Project: Hive Issue Type: Task Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov MD5(str) Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10485) Create md5 UDF
[ https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10485: --- Description: MD5(str) Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ MySQL has md5 function: https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5 PostgreSQL also has md5 function: http://www.postgresql.org/docs/9.1/static/functions-string.html was: MD5(str) Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ Create md5 UDF -- Key: HIVE-10485 URL: https://issues.apache.org/jira/browse/HIVE-10485 Project: Hive Issue Type: Task Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov MD5(str) Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ MySQL has md5 function: https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5 PostgreSQL also has md5 function: http://www.postgresql.org/docs/9.1/static/functions-string.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10485) Create md5 UDF
[ https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10485: --- Description: MD5(str) Calculates an MD5 128-bit checksum for UTF-8 string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ MySQL has md5 function: https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5 PostgreSQL also has md5 function: http://www.postgresql.org/docs/9.1/static/functions-string.html was: MD5(str) Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ MySQL has md5 function: https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5 PostgreSQL also has md5 function: http://www.postgresql.org/docs/9.1/static/functions-string.html Create md5 UDF -- Key: HIVE-10485 URL: https://issues.apache.org/jira/browse/HIVE-10485 Project: Hive Issue Type: Task Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov MD5(str) Calculates an MD5 128-bit checksum for UTF-8 string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ MySQL has md5 function: https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5 PostgreSQL also has md5 function: http://www.postgresql.org/docs/9.1/static/functions-string.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10477) Provide option to disable Spark tests
[ https://issues.apache.org/jira/browse/HIVE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10477: - Attachment: HIVE-10477.01.patch Provide option to disable Spark tests -- Key: HIVE-10477 URL: https://issues.apache.org/jira/browse/HIVE-10477 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10477.01.patch The following is one of the reasons why we might want to provide an option to disable spark tests : In the current master branch, unit tests fail with windows OS because of the dependency on bash executable in itests/hive-unit/pom.xml around these lines : {code} target exec executable=bash dir=${basedir} failonerror=true arg line=../target/download.sh/ /exec /target {code} We should provide an option to disable spark tests in OSes like Windows where bash might be absent. That being mentioned, spark tests will be enabled by default in pre-commit test runs and should still continue to work as it is in the master branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9645) Constant folding case NULL equality
[ https://issues.apache.org/jira/browse/HIVE-9645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512357#comment-14512357 ] Gopal V commented on HIVE-9645: --- [~apivovarov]: the whole VOID handling cases where only done partly since the newest patch generates ((int)null) instead of void(null) for column types. Constant folding case NULL equality --- Key: HIVE-9645 URL: https://issues.apache.org/jira/browse/HIVE-9645 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Fix For: 1.2.0 Attachments: HIVE-9645.1.patch, HIVE-9645.2.patch, HIVE-9645.3.patch, HIVE-9645.4.patch, HIVE-9645.5.patch, HIVE-9645.6.patch, HIVE-9645.7.patch, HIVE-9645.patch Hive logical optimizer does not follow the Null scan codepath when encountering a NULL = 1; NULL = 1 is not evaluated as false in the constant propogation implementation. {code} hive explain select count(1) from store_sales where null=1; ... TableScan alias: store_sales filterExpr: (null = 1) (type: boolean) Statistics: Num rows: 550076554 Data size: 49570324480 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (null = 1) (type: boolean) Statistics: Num rows: 275038277 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9645) Constant folding case NULL equality
[ https://issues.apache.org/jira/browse/HIVE-9645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512312#comment-14512312 ] Alexander Pivovarov commented on HIVE-9645: --- Why VOID added to obtainIntConverter but not to obtainLongConverter in GenericUDF? same question for obtainDateConverter and obtainTimestampConverter Constant folding case NULL equality --- Key: HIVE-9645 URL: https://issues.apache.org/jira/browse/HIVE-9645 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Fix For: 1.2.0 Attachments: HIVE-9645.1.patch, HIVE-9645.2.patch, HIVE-9645.3.patch, HIVE-9645.4.patch, HIVE-9645.5.patch, HIVE-9645.6.patch, HIVE-9645.7.patch, HIVE-9645.patch Hive logical optimizer does not follow the Null scan codepath when encountering a NULL = 1; NULL = 1 is not evaluated as false in the constant propogation implementation. {code} hive explain select count(1) from store_sales where null=1; ... TableScan alias: store_sales filterExpr: (null = 1) (type: boolean) Statistics: Num rows: 550076554 Data size: 49570324480 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (null = 1) (type: boolean) Statistics: Num rows: 275038277 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10485) Create md5 UDF
[ https://issues.apache.org/jira/browse/HIVE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10485: --- Attachment: HIVE-10485.1.patch patch #1 Create md5 UDF -- Key: HIVE-10485 URL: https://issues.apache.org/jira/browse/HIVE-10485 Project: Hive Issue Type: Task Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10485.1.patch MD5(str) Calculates an MD5 128-bit checksum for the string. The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. The return value can, for example, be used as a hash key. Example: {code} SELECT MD5('udf_md5'); 'ce62ef0d2d27dc37b6d488b92f4b24fd' {code} online md5 generator: http://www.md5.cz/ MySQL has md5 function: https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_md5 PostgreSQL also has md5 function: http://www.postgresql.org/docs/9.1/static/functions-string.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6774) Not a valid JAR errors from TestExecDriver
[ https://issues.apache.org/jira/browse/HIVE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512368#comment-14512368 ] Jan Morlock commented on HIVE-6774: --- I consider this fact as very annoying. Every Hive newcomer reading the GettingStarted guide and executing the instructions written there, will get into this frustrating situation. See for example http://stackoverflow.com/questions/25353207/hive-testmapplan1org-apache-hadoop-hive-ql-exec-testexecdriver-failed Not a valid JAR errors from TestExecDriver Key: HIVE-6774 URL: https://issues.apache.org/jira/browse/HIVE-6774 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere If I wipe out my local Maven repository and run the command: mvn clean install -Dtest=TestExecDriver -Phadoop-1 All of the TestExecDriver tests fail with the following errors: {noformat} Not a valid JAR: /Users/jdere/.m2/repository/org/apache/hive/hive-exec/0.14.0-SNAPSHOT/hive-exec-0.14.0-SNAPSHOT.jar Execution failed with exit status: 255 Obtaining error information Task failed! Task ID: null Logs: /Users/jdere/dev/hive.git/ql/target/tmp/log/hive.log java.lang.NullPointerException at org.apache.hadoop.hive.ql.session.SessionState.addLocalMapRedErrors(SessionState.java:919) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:282) at org.apache.hadoop.hive.ql.exec.TestExecDriver.executePlan(TestExecDriver.java:460) at org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1(TestExecDriver.java:474) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.
[ https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-5202: Issue Type: Improvement (was: Bug) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types. --- Key: HIVE-5202 URL: https://issues.apache.org/jira/browse/HIVE-5202 Project: Hive Issue Type: Improvement Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Fix For: 0.13.0 Attachments: HIVE-5202.2.patch.txt, HIVE-5202.patch These 3 tasks should be accomplished as part of the following jira: 1. The current implementation lacks settable union object inspector. We can run into exception inside ObjectInspectorConverters.getConvertedOI() if there is a union. 2. Implement the following public functions for all datatypes: isSettable()- Perform shallow check to see if an object inspector is inherited from settableOI type and hasAllFieldsSettable() - Perform deep check to see if this objectInspector and all the underlying object inspectors are inherited from settableOI type. 3. ObjectInspectorConverters.getConvertedOI() is inefficient. Once (1) and (2) are implemented, add the following check: outputOI.hasAllSettableFields() should be added to return outputOI immediately if the object is entirely settable in order to prevent redundant object instantiation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3404) Create quarter UDF
[ https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512314#comment-14512314 ] Hive QA commented on HIVE-3404: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12728128/HIVE-3404.2.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8817 tests executed *Failed tests:* {noformat} TestDummy - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3582/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3582/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3582/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12728128 - PreCommit-HIVE-TRUNK-Build Create quarter UDF -- Key: HIVE-3404 URL: https://issues.apache.org/jira/browse/HIVE-3404 Project: Hive Issue Type: New Feature Components: UDF Reporter: Sanam Naz Assignee: Alexander Pivovarov Attachments: HIVE-3404.1.patch.txt, HIVE-3404.2.patch, HIVE-3404.2.patch The function QUARTER(date) would return the quarter from a string / date / timestamp. This will be useful for different domains like retail ,finance etc. MySQL has QUARTER function https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_quarter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()
[ https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512568#comment-14512568 ] Gabor Liptak commented on HIVE-7150: I uploaded an upodated patch (but the QA build didn't run ...) FileInputStream is not closed in HiveConnection#getHttpClient() --- Key: HIVE-7150 URL: https://issues.apache.org/jira/browse/HIVE-7150 Project: Hive Issue Type: Bug Reporter: Ted Yu Labels: jdbc Fix For: 1.2.0 Attachments: HIVE-7150.1.patch Here is related code: {code} sslTrustStore.load(new FileInputStream(sslTrustStorePath), sslTrustStorePassword.toCharArray()); {code} The FileInputStream is not closed upon returning from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)