[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec
[ https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1691: - Attachment: HIVE-1691.patch > ANALYZE TABLE command should check columns in partitioin spec > - > > Key: HIVE-1691 > URL: https://issues.apache.org/jira/browse/HIVE-1691 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1691.patch > > > ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc > are partition columns. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec
[ https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1691: - Status: Patch Available (was: Open) > ANALYZE TABLE command should check columns in partitioin spec > - > > Key: HIVE-1691 > URL: https://issues.apache.org/jira/browse/HIVE-1691 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1691.patch > > > ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc > are partition columns. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1691) ANALYZE TABLE command should check columns in partitioin spec
ANALYZE TABLE command should check columns in partitioin spec - Key: HIVE-1691 URL: https://issues.apache.org/jira/browse/HIVE-1691 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc are partition columns. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query
[ https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1376: - Status: Patch Available (was: Open) > Simple UDAFs with more than 1 parameter crash on empty row query > - > > Key: HIVE-1376 > URL: https://issues.apache.org/jira/browse/HIVE-1376 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Ning Zhang > Attachments: HIVE-1376.2.patch, HIVE-1376.patch > > > Simple UDAFs with more than 1 parameter crash when the query returns no rows. > Currently, this only seems to affect the percentile() UDAF where the second > parameter is the percentile to be computed (of type double). I've also > verified the bug by adding a dummy parameter to ExampleMin in contrib. > On an empty query, Hive seems to be trying to resolve an iterate() method > with signature {null,null} instead of {null,double}. You can reproduce this > bug using: > CREATE TABLE pct_test ( val INT ); > SELECT percentile(val, 0.5) FROM pct_test; > which produces a lot of errors like: > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to > execute method public boolean > org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double) > on object > org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 > of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator > with arguments {null, null} of size 2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results
[ https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1674: - Attachment: HIVE-1674.patch > count(*) returns wrong result when a mapper returns empty results > - > > Key: HIVE-1674 > URL: https://issues.apache.org/jira/browse/HIVE-1674 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1674.patch > > > select count(*) from src where false; will return # of mappers rather than 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results
[ https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1674: - Status: Patch Available (was: Open) > count(*) returns wrong result when a mapper returns empty results > - > > Key: HIVE-1674 > URL: https://issues.apache.org/jira/browse/HIVE-1674 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1674.patch > > > select count(*) from src where false; will return # of mappers rather than 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site
[ https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917001#action_12917001 ] Ning Zhang commented on HIVE-1611: -- Thanks for the link Alex. I've talked to Ashish and he said Hive has just been approved to TLP. There might be some work need to be done to move the wiki and all documentations (I think Edward Capriolo has volunteered to do so?). Let me ask Edward and see what he thinks. > Add alternative search-provider to Hive site > > > Key: HIVE-1611 > URL: https://issues.apache.org/jira/browse/HIVE-1611 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Alex Baranau >Assignee: Alex Baranau >Priority: Minor > Attachments: HIVE-1611.patch > > > Use search-hadoop.com service to make available search in Hive sources, MLs, > wiki, etc. > This was initially proposed on user mailing list. The search service was > already added in site's skin (common for all Hadoop related projects) before > so this issue is about enabling it for Hive. The ultimate goal is to use it > at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1684) intermittent failures in create_escape.q
[ https://issues.apache.org/jira/browse/HIVE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang resolved HIVE-1684. -- Resolution: Duplicate duplicate of HIVE-1669. > intermittent failures in create_escape.q > > > Key: HIVE-1684 > URL: https://issues.apache.org/jira/browse/HIVE-1684 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: He Yongqiang > > [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I > lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime > -I Location -I transient_lastDdlTime -I last_modified_ -I > java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I > Caused by: -I [.][.][.] [0-9]* more > /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out > > /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out > [junit] 48d47 > [junit] < serialization.format\t > [junit] 49a49 > [junit] > serialization.format\t > Sometimes, I see the above failure. > This does not happen always, and needs to be investigated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1684) intermittent failures in create_escape.q
[ https://issues.apache.org/jira/browse/HIVE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916990#action_12916990 ] Ning Zhang commented on HIVE-1684: -- This is the same as HIVE-1669, which was introduced in the new desc extended feature. It should be addressed by HIVE-1658. > intermittent failures in create_escape.q > > > Key: HIVE-1684 > URL: https://issues.apache.org/jira/browse/HIVE-1684 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: He Yongqiang > > [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I > lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime > -I Location -I transient_lastDdlTime -I last_modified_ -I > java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I > Caused by: -I [.][.][.] [0-9]* more > /data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out > > /data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out > [junit] 48d47 > [junit] < serialization.format\t > [junit] 49a49 > [junit] > serialization.format\t > Sometimes, I see the above failure. > This does not happen always, and needs to be investigated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query
[ https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1376: - Attachment: HIVE-1376.2.patch The previous patch failed on several test, particularly count(*) queries. Attaching a new patch for percentile only and will update a patch for HIVE-1674 separately. > Simple UDAFs with more than 1 parameter crash on empty row query > - > > Key: HIVE-1376 > URL: https://issues.apache.org/jira/browse/HIVE-1376 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Ning Zhang > Attachments: HIVE-1376.2.patch, HIVE-1376.patch > > > Simple UDAFs with more than 1 parameter crash when the query returns no rows. > Currently, this only seems to affect the percentile() UDAF where the second > parameter is the percentile to be computed (of type double). I've also > verified the bug by adding a dummy parameter to ExampleMin in contrib. > On an empty query, Hive seems to be trying to resolve an iterate() method > with signature {null,null} instead of {null,double}. You can reproduce this > bug using: > CREATE TABLE pct_test ( val INT ); > SELECT percentile(val, 0.5) FROM pct_test; > which produces a lot of errors like: > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to > execute method public boolean > org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double) > on object > org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 > of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator > with arguments {null, null} of size 2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site
[ https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916927#action_12916927 ] Ning Zhang commented on HIVE-1611: -- Hi Alex, some questions: - Hive doesn't have the file author/src/documentation/skinconf.xml, which is included in the patch. How does this work? - This patch and comments suggest this patch is for Hadoop subprojects. Hive is transiting to a TLP independent of Hadoop. Is there an issue after the transition? > Add alternative search-provider to Hive site > > > Key: HIVE-1611 > URL: https://issues.apache.org/jira/browse/HIVE-1611 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Alex Baranau >Assignee: Alex Baranau >Priority: Minor > Attachments: HIVE-1611.patch > > > Use search-hadoop.com service to make available search in Hive sources, MLs, > wiki, etc. > This was initially proposed on user mailing list. The search service was > already added in site's skin (common for all Hadoop related projects) before > so this issue is about enabling it for Hive. The ultimate goal is to use it > at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916798#action_12916798 ] Ning Zhang commented on HIVE-1526: -- I will take a look. > Hive should depend on a release version of Thrift > - > > Key: HIVE-1526 > URL: https://issues.apache.org/jira/browse/HIVE-1526 > Project: Hadoop Hive > Issue Type: Task > Components: Build Infrastructure, Clients >Reporter: Carl Steinbach >Assignee: Todd Lipcon > Fix For: 0.7.0 > > Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, > libthrift.jar > > > Hive should depend on a release version of Thrift, and ideally it should use > Ivy to resolve this dependency. > The Thrift folks are working on adding Thrift artifacts to a maven repository > here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query
[ https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1376: - Status: Open (was: Patch Available) > Simple UDAFs with more than 1 parameter crash on empty row query > - > > Key: HIVE-1376 > URL: https://issues.apache.org/jira/browse/HIVE-1376 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Ning Zhang > Attachments: HIVE-1376.patch > > > Simple UDAFs with more than 1 parameter crash when the query returns no rows. > Currently, this only seems to affect the percentile() UDAF where the second > parameter is the percentile to be computed (of type double). I've also > verified the bug by adding a dummy parameter to ExampleMin in contrib. > On an empty query, Hive seems to be trying to resolve an iterate() method > with signature {null,null} instead of {null,double}. You can reproduce this > bug using: > CREATE TABLE pct_test ( val INT ); > SELECT percentile(val, 0.5) FROM pct_test; > which produces a lot of errors like: > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to > execute method public boolean > org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double) > on object > org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 > of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator > with arguments {null, null} of size 2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS
[ https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1157: - Status: Open (was: Patch Available) > UDFs can't be loaded via "add jar" when jar is on HDFS > -- > > Key: HIVE-1157 > URL: https://issues.apache.org/jira/browse/HIVE-1157 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Philip Zeyliger >Priority: Minor > Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, > HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, > output.txt > > > As discussed on the mailing list, it would be nice if you could use UDFs that > are on jars on HDFS. The proposed implementation would be for "add jar" to > recognize that the target file is on HDFS, copy it locally, and load it into > the classpath. > {quote} > Hi folks, > I have a quick question about UDF support in Hive. I'm on the 0.5 branch. > Can you use a UDF where the jar which contains the function is on HDFS, and > not on the local filesystem. Specifically, the following does not seem to > work: > # This is Hive 0.5, from svn > $bin/hive > Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt > hive> add jar hdfs://localhost/FooTest.jar; > > Added hdfs://localhost/FooTest.jar to class path > hive> create temporary function cube as 'com.cloudera.FooTestUDF'; > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.FunctionTask > Does this work for other people? I could probably fix it by changing "add > jar" to download remote jars locally, when necessary (to load them into the > classpath), or update URLClassLoader (or whatever is underneath there) to > read directly from HDFS, which seems a bit more fragile. But I wanted to > make sure that my interpretation of what's going on is right before I have at > it. > Thanks, > -- Philip > {quote} > {quote} > Yes that's correct. I prefer to download the jars in "add jar". > Zheng > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1427) Provide metastore schema migration scripts (0.5 -> 0.6)
[ https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916742#action_12916742 ] Ning Zhang commented on HIVE-1427: -- Carl, this is the only 0.6 blocker that doesn't have patch available. Can you work on this as a hi-pri? > Provide metastore schema migration scripts (0.5 -> 0.6) > --- > > Key: HIVE-1427 > URL: https://issues.apache.org/jira/browse/HIVE-1427 > Project: Hadoop Hive > Issue Type: Task > Components: Metastore >Reporter: Carl Steinbach >Assignee: Carl Steinbach > Fix For: 0.6.0 > > > At a minimum this ticket covers packaging up example MySQL migration scripts > (cumulative across all schema changes from 0.5 to 0.6) and explaining what to > do with them in the release notes. > This is also probably a good point at which to decide and clearly state which > Metastore DBs we officially support in production, e.g. do we need to provide > migration scripts for Derby? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916741#action_12916741 ] Ning Zhang commented on HIVE-1526: -- Carl and Todd, is this a blocking issue for 0.6? If not, we can make it in 0.7 and get 0.6 release ASAP. > Hive should depend on a release version of Thrift > - > > Key: HIVE-1526 > URL: https://issues.apache.org/jira/browse/HIVE-1526 > Project: Hadoop Hive > Issue Type: Task > Components: Build Infrastructure, Clients >Reporter: Carl Steinbach >Assignee: Todd Lipcon > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, > libthrift.jar > > > Hive should depend on a release version of Thrift, and ideally it should use > Ivy to resolve this dependency. > The Thrift folks are working on adding Thrift artifacts to a maven repository > here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query
[ https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1376: - Status: Patch Available (was: Open) Assignee: Ning Zhang > Simple UDAFs with more than 1 parameter crash on empty row query > - > > Key: HIVE-1376 > URL: https://issues.apache.org/jira/browse/HIVE-1376 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri >Assignee: Ning Zhang > Attachments: HIVE-1376.patch > > > Simple UDAFs with more than 1 parameter crash when the query returns no rows. > Currently, this only seems to affect the percentile() UDAF where the second > parameter is the percentile to be computed (of type double). I've also > verified the bug by adding a dummy parameter to ExampleMin in contrib. > On an empty query, Hive seems to be trying to resolve an iterate() method > with signature {null,null} instead of {null,double}. You can reproduce this > bug using: > CREATE TABLE pct_test ( val INT ); > SELECT percentile(val, 0.5) FROM pct_test; > which produces a lot of errors like: > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to > execute method public boolean > org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double) > on object > org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 > of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator > with arguments {null, null} of size 2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query
[ https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1376: - Attachment: HIVE-1376.patch Attaching a patch for review. This patch also fixes HIVE-1674 (count(*) returning wrong results). Tests are still running. Will upload a new patch if there are more changes. This patch implements 3) as suggest and SELECT PERCENTILE(col, 0.5) from src where false; will return a single row with NULL as value. > Simple UDAFs with more than 1 parameter crash on empty row query > - > > Key: HIVE-1376 > URL: https://issues.apache.org/jira/browse/HIVE-1376 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Mayank Lahiri > Attachments: HIVE-1376.patch > > > Simple UDAFs with more than 1 parameter crash when the query returns no rows. > Currently, this only seems to affect the percentile() UDAF where the second > parameter is the percentile to be computed (of type double). I've also > verified the bug by adding a dummy parameter to ExampleMin in contrib. > On an empty query, Hive seems to be trying to resolve an iterate() method > with signature {null,null} instead of {null,double}. You can reproduce this > bug using: > CREATE TABLE pct_test ( val INT ); > SELECT percentile(val, 0.5) FROM pct_test; > which produces a lot of errors like: > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to > execute method public boolean > org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double) > on object > org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 > of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator > with arguments {null, null} of size 2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results
[ https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang reassigned HIVE-1674: Assignee: Ning Zhang > count(*) returns wrong result when a mapper returns empty results > - > > Key: HIVE-1674 > URL: https://issues.apache.org/jira/browse/HIVE-1674 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > > select count(*) from src where false; will return # of mappers rather than 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results
count(*) returns wrong result when a mapper returns empty results - Key: HIVE-1674 URL: https://issues.apache.org/jira/browse/HIVE-1674 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang select count(*) from src where false; will return # of mappers rather than 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs
[ https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916256#action_12916256 ] Ning Zhang commented on HIVE-1638: -- Siying, great work! Also can you do an optimization for the case when the parameters are constants (e.g., the 2nd parameter of f_c='5015'). The objectInspector doesn't have the information of whether the input parameter is constant or not, but I think if you check in evaluate() whether the parameter is the same *object* between the 1st and 2nd row, you can conclude the parameter is a constant. This can save a lot in object constructions. > convert commonly used udfs to generic udfs > -- > > Key: HIVE-1638 > URL: https://issues.apache.org/jira/browse/HIVE-1638 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Siying Dong > Attachments: HIVE-1638.1.patch > > > Copying a mail from Joy: > i did a little bit of profiling of a simple hive group by query today. i was > surprised to see that one of the most expensive functions were in converting > the equals udf (i had some simple string filters) to generic udfs. > (primitiveobjectinspectorconverter.textconverter) > am i correct in thinking that the fix is to simply port some of the most > popular udfs (string equality/comparison etc.) to generic udsf? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1665) drop operations may cause file leak
[ https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916249#action_12916249 ] Ning Zhang commented on HIVE-1665: -- What about 2 failed and rolling back 1) also failed? This could happen if the CLI got killed at any time between 1) and 2). Another option is to use the traditional 'mark-then-delete' trick that you mark the partition as deleted on the metastore first and then clean up the data. In case of any failure, redoing the drop partiton will resume the data deletion process. It is also easier from the administrator's point of view that you can periodically check the metastore for deleted partitions (which are left uncommitted) and re-drop the partition. > drop operations may cause file leak > --- > > Key: HIVE-1665 > URL: https://issues.apache.org/jira/browse/HIVE-1665 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang >Assignee: He Yongqiang > Attachments: hive-1665.1.patch > > > Right now when doing a drop, Hive first drops metadata and then drops the > actual files. If file system is down at that time, the files will keep not > deleted. > Had an offline discussion about this: > to fix this, add a new conf "scratch dir" into hive conf. > when doing a drop operation: > 1) move data to scratch directory > 2) drop metadata > 3) if 2) failed, roll back 1) and report error 3.1 > if 2) succeeded, drop data from scratch directory 3.2 > 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied > manually. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1524) parallel execution failed if mapred.job.name is set
[ https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang resolved HIVE-1524. -- Release Note: Committed to branch 0.6. Thanks Yuanjun! Resolution: Fixed > parallel execution failed if mapred.job.name is set > --- > > Key: HIVE-1524 > URL: https://issues.apache.org/jira/browse/HIVE-1524 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, > HIVE-1524.patch > > > The plan file name was generated based on mapred.job.name. If the user > specify mapred.job.name before the query, two parallel queries will have > conflict plan file name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-675) add database/schema support Hive QL
[ https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915990#action_12915990 ] Ning Zhang commented on HIVE-675: - That works. Thanks Carl! > add database/schema support Hive QL > --- > > Key: HIVE-675 > URL: https://issues.apache.org/jira/browse/HIVE-675 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Carl Steinbach > Fix For: 0.6.0, 0.7.0 > > Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, > hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, > hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, > HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, > HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, > HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, > HIVE-675.13.patch.txt > > > Currently all Hive tables reside in single namespace (default). Hive should > support multiple namespaces (databases or schemas) such that users can create > tables in their specific namespaces. These name spaces can have different > warehouse directories (with a default naming scheme) and possibly different > properties. > There is already some support for this in metastore but Hive query parser > should have this feature as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set
[ https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915949#action_12915949 ] Ning Zhang commented on HIVE-1524: -- Currently branch 0.6 is broken. It may be caused by HIVE-675 patch. I'll run test after that one is resolved. > parallel execution failed if mapred.job.name is set > --- > > Key: HIVE-1524 > URL: https://issues.apache.org/jira/browse/HIVE-1524 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.6.0, 0.7.0 > > Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, > HIVE-1524.patch > > > The plan file name was generated based on mapred.job.name. If the user > specify mapred.job.name before the query, two parallel queries will have > conflict plan file name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-675) add database/schema support Hive QL
[ https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915946#action_12915946 ] Ning Zhang commented on HIVE-675: - Hi Carl, Branch 0.6 currently is broken when running a unit test. The error is as follows: compile-test: [javac] /data/users/nzhang/reviews/0.6/branch-0.6/build-common.xml:307: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds [javac] Compiling 4 source files to /data/users/nzhang/reviews/0.6/branch-0.6/build/metastore/test/classes [javac] /data/users/nzhang/reviews/0.6/branch-0.6/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreRemote.java:76: partitionTester(org.apache.hadoop.hive.metastore.HiveMetaStoreClient,org.apache.hadoop.hive.conf.HiveConf) in org.apache.hadoop.hive.metastore.TestHiveMetaStore cannot be applied to (org.apache.hadoop.hive.metastore.HiveMetaStoreClient,org.apache.hadoop.hive.conf.HiveConf,boolean) [javac] TestHiveMetaStore.partitionTester(client, hiveConf, true); [javac] ^ [javac] 1 error It seems the last patch that touches TestHiveMetaStore and TestHiveMetaStoreRemote is this patch. Can you take a look? > add database/schema support Hive QL > --- > > Key: HIVE-675 > URL: https://issues.apache.org/jira/browse/HIVE-675 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore, Query Processor >Reporter: Prasad Chakka >Assignee: Carl Steinbach > Fix For: 0.6.0, 0.7.0 > > Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, > hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, > hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, > HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, > HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, > HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, > HIVE-675.13.patch.txt > > > Currently all Hive tables reside in single namespace (default). Hive should > support multiple namespaces (databases or schemas) such that users can create > tables in their specific namespaces. These name spaces can have different > warehouse directories (with a default naming scheme) and possibly different > properties. > There is already some support for this in metastore but Hive query parser > should have this feature as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1524) parallel execution failed if mapred.job.name is set
[ https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915651#action_12915651 ] Ning Zhang commented on HIVE-1524: -- Thanks for back porting to 0.6 yourchanges. The code changes look good. Can you include the other 2 files (parallel.q and parallel.q.out) in the patch as well? > parallel execution failed if mapred.job.name is set > --- > > Key: HIVE-1524 > URL: https://issues.apache.org/jira/browse/HIVE-1524 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, > HIVE-1524.patch > > > The plan file name was generated based on mapred.job.name. If the user > specify mapred.job.name before the query, two parallel queries will have > conflict plan file name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1378: - Status: Resolved (was: Patch Available) Resolution: Fixed Committed. Thanks Steven! > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, > HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, > HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1669) non-deterministic display of storage parameter in test
[ https://issues.apache.org/jira/browse/HIVE-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1669: - Parent: HIVE-1658 Issue Type: Sub-task (was: Test) > non-deterministic display of storage parameter in test > -- > > Key: HIVE-1669 > URL: https://issues.apache.org/jira/browse/HIVE-1669 > Project: Hadoop Hive > Issue Type: Sub-task >Reporter: Ning Zhang > > With the change to beautify the 'desc extended table', the storage parameters > are displayed in non-deterministic manner (since its implementation is > HashMap). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting
[ https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915437#action_12915437 ] Ning Zhang commented on HIVE-1658: -- Another issue is that now 'desc extended' displays table/partition parameters in different lines. Since parameters is using a unordered map implementation, it will give non-deterministic display of those parameters. It will be great if the pretty operator will take care of ordering as well. > Fix describe [extended] column formatting > - > > Key: HIVE-1658 > URL: https://issues.apache.org/jira/browse/HIVE-1658 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Thiruvel Thirumoolan > > When displaying the column schema, the formatting should follow should be > nametypecomment > to be inline with the previous formatting style for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
[ https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang resolved HIVE-1582. -- Resolution: Not A Problem Taked to Namit and Yongqiang, this is not a bug. INSERT OVERWRITE to (HDFS) directory should be merged as before. INSERT OVERWRITE LOCAL DIRECTORY cannot be merged and this is not the case. > merge mapfiles task behaves incorrectly for 'inserting overwrite directory...' > -- > > Key: HIVE-1582 > URL: https://issues.apache.org/jira/browse/HIVE-1582 > Project: Hadoop Hive > Issue Type: Bug >Reporter: He Yongqiang > > hive> > > > > > > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > hive>SET hive.exec.compress.output=false; > hive>INSERT OVERWRITE DIRECTORY 'x' > > SELECT from a; > Total MapReduce jobs = 2 > Launching Job 1 out of 2 > Number of reduce tasks is set to 0 since there's no reduce operator > .. > Ended Job = job_201008191557_54169 > Ended Job = 450290112, job is filtered out (removed at runtime). > Launching Job 2 out of 2 > . > the second job should not get started. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected
[ https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914764#action_12914764 ] Ning Zhang commented on HIVE-1670: -- Not sure whether this patch fixes that bug. Maybe they can try this patch with their query. > MapJoin throws EOFExeption when the mapjoined table has 0 column selected > - > > Key: HIVE-1670 > URL: https://issues.apache.org/jira/browse/HIVE-1670 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1670.patch > > > select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); > throws EOFException -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914756#action_12914756 ] Ning Zhang commented on HIVE-1378: -- +1. testing. > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, > HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, > HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected
[ https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1670: - Status: Patch Available (was: Open) > MapJoin throws EOFExeption when the mapjoined table has 0 column selected > - > > Key: HIVE-1670 > URL: https://issues.apache.org/jira/browse/HIVE-1670 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1670.patch > > > select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); > throws EOFException -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected
[ https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1670: - Attachment: HIVE-1670.patch > MapJoin throws EOFExeption when the mapjoined table has 0 column selected > - > > Key: HIVE-1670 > URL: https://issues.apache.org/jira/browse/HIVE-1670 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1670.patch > > > select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); > throws EOFException -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected
MapJoin throws EOFExeption when the mapjoined table has 0 column selected - Key: HIVE-1670 URL: https://issues.apache.org/jira/browse/HIVE-1670 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); throws EOFException -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1669) non-deterministic display of storage parameter in test
non-deterministic display of storage parameter in test -- Key: HIVE-1669 URL: https://issues.apache.org/jira/browse/HIVE-1669 Project: Hadoop Hive Issue Type: Test Reporter: Ning Zhang With the change to beautify the 'desc extended table', the storage parameters are displayed in non-deterministic manner (since its implementation is HashMap). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url
[ https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1659: - Status: Resolved (was: Patch Available) Resolution: Fixed Committed. Thanks Xing! > parse_url_tuple: a UDTF version of parse_url > - > > Key: HIVE-1659 > URL: https://issues.apache.org/jira/browse/HIVE-1659 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.5.0 >Reporter: Ning Zhang > Attachments: HIVE-1659.patch, HIVE-1659.patch2, HIVE-1659.patch3 > > > The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. > However it can only extract an atomic value from the URL. If we want to > extract multiple piece of information, we need to call the function many > times. It is desirable to parse the URL once and extract all needed > information and return a tuple in a UDTF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url
[ https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914602#action_12914602 ] Ning Zhang commented on HIVE-1659: -- Xing, there is a diff in show_functions.q. You need to overwrite the .out file with the addition of the new function. The following command will update the out file. ant test -Dtestcase=TestCliDriver -Dqfile=show_functions.q -Doverwrite=true Can you regenerate the patch after that? > parse_url_tuple: a UDTF version of parse_url > - > > Key: HIVE-1659 > URL: https://issues.apache.org/jira/browse/HIVE-1659 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.5.0 >Reporter: Ning Zhang > Attachments: HIVE-1659.patch, HIVE-1659.patch2 > > > The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. > However it can only extract an atomic value from the URL. If we want to > extract multiple piece of information, we need to call the function many > times. It is desirable to parse the URL once and extract all needed > information and return a tuple in a UDTF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914598#action_12914598 ] Ning Zhang commented on HIVE-1378: -- Before we decided to drop support for pre-0.20, we should have a separate JIRA to have a list of things that need to clean up: e.g., exclude downloading& building hadoop 0.17. In the mean time, the change in the patch to be pre-0.20 compatible should be minimum. Steven, can you take a look the code and see how much it required to be done to be compatible with 0.17? > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, > HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url
[ https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914360#action_12914360 ] Ning Zhang commented on HIVE-1659: -- Also when you generate the patch, you need to run 'svn diff' at the "root" directory of the hive trunk. > parse_url_tuple: a UDTF version of parse_url > - > > Key: HIVE-1659 > URL: https://issues.apache.org/jira/browse/HIVE-1659 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.5.0 >Reporter: Ning Zhang > Attachments: HIVE-1659.patch > > > The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. > However it can only extract an atomic value from the URL. If we want to > extract multiple piece of information, we need to call the function many > times. It is desirable to parse the URL once and extract all needed > information and return a tuple in a UDTF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url
[ https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914358#action_12914358 ] Ning Zhang commented on HIVE-1659: -- Xing, this patch doesn't apply cleanly with the latest trunk. Can you 'svn up' and regenerate the patch. You may need to resolve any conflicts after 'svn up'. > parse_url_tuple: a UDTF version of parse_url > - > > Key: HIVE-1659 > URL: https://issues.apache.org/jira/browse/HIVE-1659 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.5.0 >Reporter: Ning Zhang > Attachments: HIVE-1659.patch > > > The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. > However it can only extract an atomic value from the URL. If we want to > extract multiple piece of information, we need to call the function many > times. It is desirable to parse the URL once and extract all needed > information and return a tuple in a UDTF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914346#action_12914346 ] Ning Zhang commented on HIVE-1378: -- @john, should we run a survey on hive-user mailing list to see how many people are still using pre-0.20 hadoop before dropping the support? > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, > HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url
[ https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914337#action_12914337 ] Ning Zhang commented on HIVE-1659: -- Xing, the patch was not attached. Can you use the link "Attach file" in the left pane? > parse_url_tuple: a UDTF version of parse_url > - > > Key: HIVE-1659 > URL: https://issues.apache.org/jira/browse/HIVE-1659 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.5.0 >Reporter: Ning Zhang > > The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. > However it can only extract an atomic value from the URL. If we want to > extract multiple piece of information, we need to call the function many > times. It is desirable to parse the URL once and extract all needed > information and return a tuple in a UDTF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914335#action_12914335 ] Ning Zhang commented on HIVE-1378: -- Steven, tests passed for hadoop 0.20, but it failed to compile on hadoop 0.17 (ant clean package -Dhadoop.version=0.17.2.1). Can you take a look? > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, > HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914305#action_12914305 ] Ning Zhang commented on HIVE-1378: -- OK. This one applied cleanly. I'm starting testing. I think 'svn up' may be able to do more merging than 'patch'. I got the conflict on eclipse-templates/.classpath (it asked me whether I want to reverse apply) and another file. > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, > HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914282#action_12914282 ] Ning Zhang commented on HIVE-1378: -- Changes look good. However there are conflicts when applying to the latest trunk. Can you generate a new one against the latest trunk? I'll start testing once I got the new patch. > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, > HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Attachment: HIVE-1361.5.java_only.patch HIVE-1361.5.patch uploading a new set of patches that resolves the conflicts with the latest commits. > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Fix For: 0.7.0 > > Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, > HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, > HIVE-1361.5.java_only.patch, HIVE-1361.5.patch, HIVE-1361.java_only.patch, > HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Attachment: HIVE-1361.4.java_only.patch HIVE-1361.4.patch Uploading new patch that refreshed to the latest trunk. Also added a negative test case analyze.q and some trivial clean up in Java code (removing commented out contents). > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Fix For: 0.7.0 > > Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, > HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, > HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913843#action_12913843 ] Ning Zhang commented on HIVE-1526: -- The Hive ODBC code is dependent on Thrift as well. In particular the hive client and unixODBC libraries have to be linked with the new libthrift.so. Can you test if the ODBC code is compatible with the new thrift version? > Hive should depend on a release version of Thrift > - > > Key: HIVE-1526 > URL: https://issues.apache.org/jira/browse/HIVE-1526 > Project: Hadoop Hive > Issue Type: Task > Components: Build Infrastructure >Reporter: Carl Steinbach >Assignee: Todd Lipcon > Attachments: hive-1526.txt, libfb303.jar, libthrift.jar > > > Hive should depend on a release version of Thrift, and ideally it should use > Ivy to resolve this dependency. > The Thrift folks are working on adding Thrift artifacts to a maven repository > here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Attachment: HIVE-1361.3.patch Updated HIVE-1361.3.patch. > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Fix For: 0.7.0 > > Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, > HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Attachment: (was: HIVE-1361.3.patch) > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Fix For: 0.7.0 > > Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, > HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting
[ https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913668#action_12913668 ] Ning Zhang commented on HIVE-1658: -- +1 on keeping the old format but add a "pretty operator" as the child of the explain, so that the execution plan for the EXPLAIN is an explain operator (with the old formatting) followed by an optional "pretty operator" taking the output and do further formatting. > Fix describe [extended] column formatting > - > > Key: HIVE-1658 > URL: https://issues.apache.org/jira/browse/HIVE-1658 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Paul Yang >Assignee: Thiruvel Thirumoolan > > When displaying the column schema, the formatting should follow should be > nametypecomment > to be inline with the previous formatting style for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Attachment: HIVE-1361.3.patch Uploading HIVE-1361.3.patch which passes all tests on hadoop 0.20 &0.17. The only difference from the last patch is the log change in stats2.q.out. > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Fix For: 0.7.0 > > Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, > HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913338#action_12913338 ] Ning Zhang commented on HIVE-1378: -- Steven, there are conflicts when applying to the latest trunk. Can you regenerate the patch? > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, > HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened
[ https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913301#action_12913301 ] Ning Zhang commented on HIVE-1651: -- Discussed with Joydeep offline. The side effects of failed task should be cleaned after the job finished. _tmp* files are already taken care of in the current code base. The only side effect that need to be taken care of is the empty directories created by failed dynamic partition inserts. This issue is addressed in HIVE-1655. > ScriptOperator should not forward any output to downstream operators if an > exception is happened > > > Key: HIVE-1651 > URL: https://issues.apache.org/jira/browse/HIVE-1651 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1651.patch > > > ScriptOperator spawns 2 threads for getting the stdout and stderr from the > script and then forward the output from stdout to downstream operators. In > case of any exceptions to the script (e.g., got killed), the ScriptOperator > got an exception and throw it to upstream operators until MapOperator got it > and call close(abort). Before the ScriptOperator.close() is called the script > output stream can still forward output to downstream operators. We should > terminate it immediately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Status: Patch Available (was: Open) > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Fix For: 0.7.0 > > Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, > HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Attachment: HIVE-1361.2.patch HIVE-1361.2_java_only.patch Uploading a new patch (including a full version and a Java_only version including XML build files) for review. This is against the latest trunk. The major changes from the last patch include: 1) Make JDBC update/insert/select using PreparedStatement(). 2) In HBase, use HTable.delete(ArrayList) to speed up delete, and flushCommit() to batch update. 3) Refactor StatsTask to put stats into PartitionStatistics and TableStatistics so that it is easier to add new stats later. 4) Move WriteEntity creation from StatsTask to compile-time. I'm running tests again after refreshed to the latest trunk. > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task > Components: Query Processor >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Fix For: 0.7.0 > > Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, > HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url
[ https://issues.apache.org/jira/browse/HIVE-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913081#action_12913081 ] Ning Zhang commented on HIVE-1659: -- parse_url currently support 2 signatures: parse_url(fullurl, '[QUERY|PATH|HOST|...]') and parse_url(fullurl, 'QUERY', '[ref|sk|...]'). In parse_url_tuple, the syntax is consolidated as parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY:ref', 'QUERY:sk',...). > parse_url_tuple: a UDTF version of parse_url > - > > Key: HIVE-1659 > URL: https://issues.apache.org/jira/browse/HIVE-1659 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Ning Zhang > > The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. > However it can only extract an atomic value from the URL. If we want to > extract multiple piece of information, we need to call the function many > times. It is desirable to parse the URL once and extract all needed > information and return a tuple in a UDTF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1659) parse_url_tuple: a UDTF version of parse_url
parse_url_tuple: a UDTF version of parse_url - Key: HIVE-1659 URL: https://issues.apache.org/jira/browse/HIVE-1659 Project: Hadoop Hive Issue Type: New Feature Reporter: Ning Zhang The UDF parse_url take s a URL, parse it and extract QUERY/PATH etc from it. However it can only extract an atomic value from the URL. If we want to extract multiple piece of information, we need to call the function many times. It is desirable to parse the URL once and extract all needed information and return a tuple in a UDTF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1609) Support partition filtering in metastore
[ https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912855#action_12912855 ] Ning Zhang commented on HIVE-1609: -- @namit, the Hive metastore already has the API to get all sub-partitions given a partial specification like you provided -- Hive.getPartitions(Table, partialPartSpec). > Support partition filtering in metastore > > > Key: HIVE-1609 > URL: https://issues.apache.org/jira/browse/HIVE-1609 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore >Reporter: Ajay Kidave >Assignee: Ajay Kidave > Fix For: 0.7.0 > > Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch > > > The metastore needs to have support for returning a list of partitions based > on user specified filter conditions. This will be useful for tools which need > to do partition pruning. Howl is one such use case. The way partition pruning > is done during hive query execution need not be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910899#action_12910899 ] Ning Zhang commented on HIVE-1378: -- Looks good in general. I've left some minor comments in the cloudera's review board. I'm not sure if it could be replicated here, but if not, I'll copy them manually. > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions
[ https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1655: - Attachment: HIVE-1655.patch > Adding consistency check at jobClose() when committing dynamic partitions > - > > Key: HIVE-1655 > URL: https://issues.apache.org/jira/browse/HIVE-1655 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1655.patch > > > In case of dynamic partition insert, FileSinkOperator generated a directory > for a new partition and the files in the directory is named with '_tmp*'. > When a task succeed, the file is renamed to remove the "_tmp", which > essentially implement the "commit" semantics. A lot of exceptions could > happen (process got killed, machine dies etc.) could left the _tmp files > exist in the DP directory. These _tmp files should be deleted ("rolled back") > at successful jobClose(). After the deletion, we should also delete any empty > directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions
[ https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1655: - Status: Patch Available (was: Open) > Adding consistency check at jobClose() when committing dynamic partitions > - > > Key: HIVE-1655 > URL: https://issues.apache.org/jira/browse/HIVE-1655 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1655.patch > > > In case of dynamic partition insert, FileSinkOperator generated a directory > for a new partition and the files in the directory is named with '_tmp*'. > When a task succeed, the file is renamed to remove the "_tmp", which > essentially implement the "commit" semantics. A lot of exceptions could > happen (process got killed, machine dies etc.) could left the _tmp files > exist in the DP directory. These _tmp files should be deleted ("rolled back") > at successful jobClose(). After the deletion, we should also delete any empty > directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions
[ https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910880#action_12910880 ] Ning Zhang commented on HIVE-1655: -- Actually the _tmp files are taken care of by FSPaths.commit() called at FileSinkOperator.close() and any missed _tmp* files are removed in jobClose() -> Utilities.removeTempOrDuplicateFiles(). The only missing piece is the remove the empty directories at jobClose(). > Adding consistency check at jobClose() when committing dynamic partitions > - > > Key: HIVE-1655 > URL: https://issues.apache.org/jira/browse/HIVE-1655 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > > In case of dynamic partition insert, FileSinkOperator generated a directory > for a new partition and the files in the directory is named with '_tmp*'. > When a task succeed, the file is renamed to remove the "_tmp", which > essentially implement the "commit" semantics. A lot of exceptions could > happen (process got killed, machine dies etc.) could left the _tmp files > exist in the DP directory. These _tmp files should be deleted ("rolled back") > at successful jobClose(). After the deletion, we should also delete any empty > directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions
Adding consistency check at jobClose() when committing dynamic partitions - Key: HIVE-1655 URL: https://issues.apache.org/jira/browse/HIVE-1655 Project: Hadoop Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang In case of dynamic partition insert, FileSinkOperator generated a directory for a new partition and the files in the directory is named with '_tmp*'. When a task succeed, the file is renamed to remove the "_tmp", which essentially implement the "commit" semantics. A lot of exceptions could happen (process got killed, machine dies etc.) could left the _tmp files exist in the DP directory. These _tmp files should be deleted ("rolled back") at successful jobClose(). After the deletion, we should also delete any empty directories. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened
[ https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910834#action_12910834 ] Ning Zhang commented on HIVE-1651: -- @joydeep, the output file will not be committed if an exception occurred and close(abort=true) is called. This bug happened in a short time window after the exception occurred and before the close(abort) is called. Although the file got deleted, the dynamic partition insert already created a directory which later will be considered as an empty partition. > ScriptOperator should not forward any output to downstream operators if an > exception is happened > > > Key: HIVE-1651 > URL: https://issues.apache.org/jira/browse/HIVE-1651 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1651.patch > > > ScriptOperator spawns 2 threads for getting the stdout and stderr from the > script and then forward the output from stdout to downstream operators. In > case of any exceptions to the script (e.g., got killed), the ScriptOperator > got an exception and throw it to upstream operators until MapOperator got it > and call close(abort). Before the ScriptOperator.close() is called the script > output stream can still forward output to downstream operators. We should > terminate it immediately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910758#action_12910758 ] Ning Zhang commented on HIVE-1378: -- Taking a look now. > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.1.patch, HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened
[ https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1651: - Status: Patch Available (was: Open) > ScriptOperator should not forward any output to downstream operators if an > exception is happened > > > Key: HIVE-1651 > URL: https://issues.apache.org/jira/browse/HIVE-1651 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1651.patch > > > ScriptOperator spawns 2 threads for getting the stdout and stderr from the > script and then forward the output from stdout to downstream operators. In > case of any exceptions to the script (e.g., got killed), the ScriptOperator > got an exception and throw it to upstream operators until MapOperator got it > and call close(abort). Before the ScriptOperator.close() is called the script > output stream can still forward output to downstream operators. We should > terminate it immediately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened
[ https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1651: - Attachment: HIVE-1651.patch > ScriptOperator should not forward any output to downstream operators if an > exception is happened > > > Key: HIVE-1651 > URL: https://issues.apache.org/jira/browse/HIVE-1651 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1651.patch > > > ScriptOperator spawns 2 threads for getting the stdout and stderr from the > script and then forward the output from stdout to downstream operators. In > case of any exceptions to the script (e.g., got killed), the ScriptOperator > got an exception and throw it to upstream operators until MapOperator got it > and call close(abort). Before the ScriptOperator.close() is called the script > output stream can still forward output to downstream operators. We should > terminate it immediately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened
ScriptOperator should not forward any output to downstream operators if an exception is happened Key: HIVE-1651 URL: https://issues.apache.org/jira/browse/HIVE-1651 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang ScriptOperator spawns 2 threads for getting the stdout and stderr from the script and then forward the output from stdout to downstream operators. In case of any exceptions to the script (e.g., got killed), the ScriptOperator got an exception and throw it to upstream operators until MapOperator got it and call close(abort). Before the ScriptOperator.close() is called the script output stream can still forward output to downstream operators. We should terminate it immediately. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910455#action_12910455 ] Ning Zhang commented on HIVE-1378: -- Steven, there are conflicts when applying this patch. Can you regenerate it? > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string
[ https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910444#action_12910444 ] Ning Zhang commented on HIVE-1378: -- Will take a look. > Return value for map, array, and struct needs to return a string > - > > Key: HIVE-1378 > URL: https://issues.apache.org/jira/browse/HIVE-1378 > Project: Hadoop Hive > Issue Type: Improvement > Components: Drivers >Reporter: Jerome Boulon >Assignee: Steven Wong > Fix For: 0.7.0 > > Attachments: HIVE-1378.patch > > > In order to be able to select/display any data from JDBC Hive driver, return > value for map, array, and struct needs to return a string -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Status: Patch Available (was: Open) > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1648: - Parent: HIVE-33 Issue Type: Sub-task (was: New Feature) > Automatically gathering stats when reading a table/partition > > > Key: HIVE-1648 > URL: https://issues.apache.org/jira/browse/HIVE-1648 > Project: Hadoop Hive > Issue Type: Sub-task >Reporter: Ning Zhang > > HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to > gathering stats. This requires additional scan of the data. Stats gathering > can be piggy-backed on TableScanOperator whenever a table/partition is > scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1648) Automatically gathering stats when reading a table/partition
Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hadoop Hive Issue Type: New Feature Reporter: Ning Zhang HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-33) [Hive]: Add ability to compute statistics on hive tables
[ https://issues.apache.org/jira/browse/HIVE-33?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910358#action_12910358 ] Ning Zhang commented on HIVE-33: Patches for HIVE-1361 are ready for review. Comments are welcome! > [Hive]: Add ability to compute statistics on hive tables > > > Key: HIVE-33 > URL: https://issues.apache.org/jira/browse/HIVE-33 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Ashish Thusoo >Assignee: Ahmed M Aly > > Add commands to collect partition and column level statistics in hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1361) table/partition level statistics
[ https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1361: - Attachment: HIVE-1361.patch HIVE-1361.java_only.patch Uploading a full version (HIVE-1361.patch) and a Java code only version (HIVE-1361.java_only.patch). This patch is based on Ahmed's previous patch and implements the following feature: 1) automatically gather stats (number of rows currently) whenever an INSERT OVERWRITE TABLE is issued. Each mapper/reducer push their partial stats to either MySQL/Derby through JDBC or HBase. The INSERT OVERWRITE statement could be anything include dynamic partition insert, multi-table inserts and inserting to bucketized partitions. A StatsTask is responsible for aggregating partial stats at the end of the query and update the metastore. 2) The stats of a table/partition can be exposed to the user by 'DESC EXTENDED' to the table/partition. They are stored as the storage parameters (numRows, nuFiles, numPartitions). 3) Introducing a new command 'ANALYZE TABLE [PARTITION (PARTITION SPEC)] COMPUTE STATISTICS' to scan the table/partition and gather stats in a similar fashion as INSERT OVERWRITE command except that the plan has only 1 MR job consisting a TableScanOperator and a StatsTask. Partition spec could be full partition spec or partial partition spec similar to what dynamic partition insert uses. This allows the user to analyze a subset/all partitions of a table. The resulting stats are stored in the same parameter in the meatstore. Tested locally (unit tests) for JDBC:derby, hbase and on a cluster with JDBC:MySQL. Will run the full unit tests again. > table/partition level statistics > > > Key: HIVE-1361 > URL: https://issues.apache.org/jira/browse/HIVE-1361 > Project: Hadoop Hive > Issue Type: Sub-task >Affects Versions: 0.6.0 >Reporter: Ning Zhang >Assignee: Ahmed M Aly > Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch > > > At the first step, we gather table-level stats for non-partitioned table and > partition-level stats for partitioned table. Future work could extend the > table level stats to partitioned table as well. > There are 3 major milestones in this subtask: > 1) extend the insert statement to gather table/partition level stats > on-the-fly. > 2) extend metastore API to support storing and retrieving stats for a > particular table/partition. > 3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for > existing tables/partitions. > The proposed stats are: > Partition-level stats: > - number of rows > - total size in bytes > - number of files > - max, min, average row sizes > - max, min, average file sizes > Table-level stats in addition to partition level stats: > - number of partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma
[ https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1639: - Attachment: HIVE-1639.2.patch Added a test case. > ExecDriver.addInputPaths() error if partition name contains a comma > --- > > Key: HIVE-1639 > URL: https://issues.apache.org/jira/browse/HIVE-1639 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1639.2.patch, HIVE-1639.patch > > > The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes > a comma-separated string representing a set of paths. If the path name of a > input file contains a comma, this code throw an exception: > java.lang.IllegalArgumentException: Can not create a Path from an empty > string. > Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths > should iterate all paths and call FileInputFormat.addInputPath. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909943#action_12909943 ] Ning Zhang commented on HIVE-1570: -- Joy, scriptfile1.q actually failed on TestMinimrCliDriver with the command ant test -Dhadoop.version=0.20.0 -Dtestcase=TestMinimrCliDriver -Dminimr.query.files=scriptfile1.q It gives NPE on ExecDriver.java:625. This NPE is a different issue and it can be solved by changing 'conf' to 'job'. But even after this change the NPE is gone and the test still failed. Should we move this test outside minimr.query.files for now before this JIRA is fixed? > referencing an added file by it's name in a transform script does not work in > hive local mode > - > > Key: HIVE-1570 > URL: https://issues.apache.org/jira/browse/HIVE-1570 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > > Yongqiang tried this and it fails in local mode: > add file ../data/scripts/dumpdata_script.py; > select count(distinct subq.key) from > (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key > = 10) subq; > this needs to be fixed because it means we cannot choose local mode > automatically in case of transform scripts (since different paths need to be > used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma
[ https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1639: - Status: Patch Available (was: Open) > ExecDriver.addInputPaths() error if partition name contains a comma > --- > > Key: HIVE-1639 > URL: https://issues.apache.org/jira/browse/HIVE-1639 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1639.patch > > > The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes > a comma-separated string representing a set of paths. If the path name of a > input file contains a comma, this code throw an exception: > java.lang.IllegalArgumentException: Can not create a Path from an empty > string. > Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths > should iterate all paths and call FileInputFormat.addInputPath. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma
[ https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1639: - Attachment: HIVE-1639.patch > ExecDriver.addInputPaths() error if partition name contains a comma > --- > > Key: HIVE-1639 > URL: https://issues.apache.org/jira/browse/HIVE-1639 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1639.patch > > > The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes > a comma-separated string representing a set of paths. If the path name of a > input file contains a comma, this code throw an exception: > java.lang.IllegalArgumentException: Can not create a Path from an empty > string. > Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths > should iterate all paths and call FileInputFormat.addInputPath. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma
ExecDriver.addInputPaths() error if partition name contains a comma --- Key: HIVE-1639 URL: https://issues.apache.org/jira/browse/HIVE-1639 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes a comma-separated string representing a set of paths. If the path name of a input file contains a comma, this code throw an exception: java.lang.IllegalArgumentException: Can not create a Path from an empty string. Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths should iterate all paths and call FileInputFormat.addInputPath. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908639#action_12908639 ] Ning Zhang commented on HIVE-1629: -- Good question John. I think this patch doesn't affect bucketing, which is implemented using ObjectInspectorUtils.hashCode(). Actually the hash function used there for Double is the same as the one provided in this patch. But I'll double check with Zheng/Namit tomorrow. > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal > Fix For: 0.7.0 > > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang resolved HIVE-1629. -- Fix Version/s: 0.7.0 Resolution: Fixed Committed. Thanks Vaibhav! > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal > Fix For: 0.7.0 > > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
[ https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1622: - Attachment: HIVE-1622_0.17.patch oops, forgot the patch hadoop 0.17 logs. > Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true > --- > > Key: HIVE-1622 > URL: https://issues.apache.org/jira/browse/HIVE-1622 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Fix For: 0.7.0 > > Attachments: HIVE-1622.patch, HIVE-1622_0.17.patch > > > Currently map-only merge (using CombineHiveInputFormat) is only enabled for > merging files generated by mappers. It should be used for files generated at > readers as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908599#action_12908599 ] Ning Zhang commented on HIVE-1629: -- +1 Will commit if tests pass. > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang reassigned HIVE-1629: Assignee: Vaibhav Aggarwal > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal >Assignee: Vaibhav Aggarwal > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1629) Patch to fix hashCode method in DoubleWritable class
[ https://issues.apache.org/jira/browse/HIVE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908194#action_12908194 ] Ning Zhang commented on HIVE-1629: -- +long v = Double.doubleToLongBits(value); +return (int) (v ^ (v >>> 32)); won't this return 0 for all long values less than 2^32? Search on the web and it seems the following 64 bit to 32 bit hash is a good one http://www.cris.com/~ttwang/tech/inthash.htm > Patch to fix hashCode method in DoubleWritable class > > > Key: HIVE-1629 > URL: https://issues.apache.org/jira/browse/HIVE-1629 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Vaibhav Aggarwal > Attachments: HIVE-1629.patch > > > A patch to fix the hashCode() method of DoubleWritable class of Hive. > It prevents the HashMap (of type DoubleWritable) from behaving as LinkedList. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
[ https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1622: - Status: Patch Available (was: Open) > Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true > --- > > Key: HIVE-1622 > URL: https://issues.apache.org/jira/browse/HIVE-1622 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1622.patch > > > Currently map-only merge (using CombineHiveInputFormat) is only enabled for > merging files generated by mappers. It should be used for files generated at > readers as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
[ https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1622: - Attachment: HIVE-1622.patch > Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true > --- > > Key: HIVE-1622 > URL: https://issues.apache.org/jira/browse/HIVE-1622 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1622.patch > > > Currently map-only merge (using CombineHiveInputFormat) is only enabled for > merging files generated by mappers. It should be used for files generated at > readers as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true --- Key: HIVE-1622 URL: https://issues.apache.org/jira/browse/HIVE-1622 Project: Hadoop Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Ning Zhang Currently map-only merge (using CombineHiveInputFormat) is only enabled for merging files generated by mappers. It should be used for files generated at readers as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string
[ https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1614: - Attachment: HIVE-1614.2.patch Added a catch for all throwable in the UDFT. > UDTF json_tuple should return null row when input is not a valid JSON string > > > Key: HIVE-1614 > URL: https://issues.apache.org/jira/browse/HIVE-1614 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1614.2.patch, HIVE-1614.patch > > > If the input column is not a valid JSON string, json_tuple will not return > anything but this will prevent the downstream operators to access the > left-hand side table. We should output a NULL row instead, similar to when > the input column is a NULL value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string
[ https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1614: - Status: Patch Available (was: Open) Affects Version/s: 0.7.0 > UDTF json_tuple should return null row when input is not a valid JSON string > > > Key: HIVE-1614 > URL: https://issues.apache.org/jira/browse/HIVE-1614 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1614.patch > > > If the input column is not a valid JSON string, json_tuple will not return > anything but this will prevent the downstream operators to access the > left-hand side table. We should output a NULL row instead, similar to when > the input column is a NULL value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string
[ https://issues.apache.org/jira/browse/HIVE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1614: - Attachment: HIVE-1614.patch > UDTF json_tuple should return null row when input is not a valid JSON string > > > Key: HIVE-1614 > URL: https://issues.apache.org/jira/browse/HIVE-1614 > Project: Hadoop Hive > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1614.patch > > > If the input column is not a valid JSON string, json_tuple will not return > anything but this will prevent the downstream operators to access the > left-hand side table. We should output a NULL row instead, similar to when > the input column is a NULL value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1614) UDTF json_tuple should return null row when input is not a valid JSON string
UDTF json_tuple should return null row when input is not a valid JSON string Key: HIVE-1614 URL: https://issues.apache.org/jira/browse/HIVE-1614 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang If the input column is not a valid JSON string, json_tuple will not return anything but this will prevent the downstream operators to access the left-hand side table. We should output a NULL row instead, similar to when the input column is a NULL value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1467) dynamic partitioning should cluster by partitions
[ https://issues.apache.org/jira/browse/HIVE-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905700#action_12905700 ] Ning Zhang commented on HIVE-1467: -- As discussed with Joydeep and Ashish, it seems we should use the "distribute by" mechanism rather than "cluster by" to avoid sorting at the reducer side. The difference between them is "distribute by" only have MapReduce partition columns set to be the Dyanmic partition columns, and "cluster by" will additionally set "key columns" as the dynamic partition columns as well. So I think we can use 2 mode of reducer-side DP with tradeoffs: -- distribute by mode: no sorting but reducers have to keep all files open during DP insert. Good choice when there are large amount of data passed from mappers to reducers. -- cluster by mode: sorting by the DP columns, but we can close a DP file once FileSinkOperator sees a dfferent DP column value. Good choice when total data size is not that large but there are large number of DPs generated. > dynamic partitioning should cluster by partitions > - > > Key: HIVE-1467 > URL: https://issues.apache.org/jira/browse/HIVE-1467 > Project: Hadoop Hive > Issue Type: Improvement >Reporter: Joydeep Sen Sarma >Assignee: Namit Jain > > (based on internal discussion with Ning). Dynamic partitioning should offer a > mode where it clusters data by partition before writing out to each > partition. This will reduce number of files. Details: > 1. always use reducer stage > 2. mapper sends to reducer based on partitioning column. ie. reducer = > f(partition-cols) > 3. f() can be made somewhat smart to: >a. spread large partitions across multiple reducers - each mapper can > maintain row count seen per partition - and then apply (whenever it sees a > new row for a partition): >* reducer = (row count / 64k) % numReducers >Small partitions always go to one reducer. the larger the partition, > the more the reducers. this prevents one reducer becoming bottleneck writing > out one partition >b. this still leaves the issue of very large number of splits. (64K rows > from 10K mappers is pretty large). for this one can apply one slight > modification: >* reducer = (mapper-id/1024 + row-count/64k) % numReducers >ie. - the first 1000 mappers always send the first 64K rows for one > partition to the same reducer. the next 1000 send it to the next one. and so > on. > the constants 1024 and 64k are used just as an example. i don't know what the > right numbers are. it's also clear that this is a case where we need hadoop > to do only partitioning (and no sorting). this will be a useful feature to > have in hadoop. that will reduce the overhead due to reducers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results
[ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1598: - Attachment: HIVE-1598.2.patch Attached the test case and also removed some debugging info. These are the only changes. > use SequenceFile rather than TextFile format for hive query results > --- > > Key: HIVE-1598 > URL: https://issues.apache.org/jira/browse/HIVE-1598 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1598.2.patch, HIVE-1598.patch > > > Hive query's result is written to a temporary directory first and then > FetchTask takes the files and display it to the users. Currently the file > format used for the resulting file is TextFile format. This could cause > incorrect result display if some string typed column contains new lines, > which are used as record delimiters in TextInputFormat. Switching to > SequenceFile format will solve this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results
[ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1598: - Attachment: HIVE-1598.patch This patch only add support for using SequenceFile as query result. There are still questions on whether we should use it for script operator or not. Will open another JIRA if needed. > use SequenceFile rather than TextFile format for hive query results > --- > > Key: HIVE-1598 > URL: https://issues.apache.org/jira/browse/HIVE-1598 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1598.patch > > > Hive query's result is written to a temporary directory first and then > FetchTask takes the files and display it to the users. Currently the file > format used for the resulting file is TextFile format. This could cause > incorrect result display if some string typed column contains new lines, > which are used as record delimiters in TextInputFormat. Switching to > SequenceFile format will solve this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results
[ https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1598: - Status: Patch Available (was: Open) all 0.17 & 0.20 tests passed. > use SequenceFile rather than TextFile format for hive query results > --- > > Key: HIVE-1598 > URL: https://issues.apache.org/jira/browse/HIVE-1598 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Ning Zhang >Assignee: Ning Zhang > Attachments: HIVE-1598.patch > > > Hive query's result is written to a temporary directory first and then > FetchTask takes the files and display it to the users. Currently the file > format used for the resulting file is TextFile format. This could cause > incorrect result display if some string typed column contains new lines, > which are used as record delimiters in TextInputFormat. Switching to > SequenceFile format will solve this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.