[jira] Created: (HIVE-650) [UDAF] implement GROUP_CONCAT(expr)
[UDAF] implement GROUP_CONCAT(expr) - Key: HIVE-650 URL: https://issues.apache.org/jira/browse/HIVE-650 Project: Hadoop Hive Issue Type: New Feature Reporter: Min Zhou It's a very useful udaf for us. http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat GROUP_CONCAT(expr) This function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULL values. The full syntax is as follows: GROUP_CONCAT([DISTINCT] expr [,expr ...] [ORDER BY {unsigned_integer | col_name | expr} [ASC | DESC] [,col_name ...]] [SEPARATOR str_val]) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-594) Add "isNull", "hashCode", "equals", "compareTo" to ObjectInspector
[ https://issues.apache.org/jira/browse/HIVE-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732363#action_12732363 ] Zheng Shao commented on HIVE-594: - Without this, we could not do complete object reuse. For example, if all columns in the first row are non-null, we need to create objects for each of the column. Then the second row comes in and all columns are null, so we have to set all fields in the row object to "null". Then the third row comes in and all columns are non-null, we need to create new objects again. LazySimpleSerDe has a hack to avoid this by adding a special function "getObject()". With "isNull" added to ObjectInspector, we don't need to set the field in the row object to null, instead, we can set a bit inside the field object so that it knows it is a null. We will need to replace the logic "xxx == null" and "xxx != null" in the code (Mainly, GenericUDFOPNull and GeneircUDFOPNonNull). > Add "isNull", "hashCode", "equals", "compareTo" to ObjectInspector > -- > > Key: HIVE-594 > URL: https://issues.apache.org/jira/browse/HIVE-594 > Project: Hadoop Hive > Issue Type: Improvement >Affects Versions: 0.3.0, 0.3.1 >Reporter: Zheng Shao >Assignee: Zheng Shao > > ObjectInspector should delegate all methods of the object, including > "isNull", "hashCode", "equals", "compareTo". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-649) [UDF] now() for getting current time
[UDF] now() for getting current time Key: HIVE-649 URL: https://issues.apache.org/jira/browse/HIVE-649 Project: Hadoop Hive Issue Type: New Feature Reporter: Min Zhou http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_now -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-609) optimize multi-group by
[ https://issues.apache.org/jira/browse/HIVE-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reassigned HIVE-609: --- Assignee: Namit Jain > optimize multi-group by > > > Key: HIVE-609 > URL: https://issues.apache.org/jira/browse/HIVE-609 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.609.1.patch > > > For query like: > from src > insert overwrite table dest1 select col1, count(distinct colx) group by col1 > insert overwrite table dest2 select col2, count(distinct colx) group by col2; > If map side aggregation is turned off, we currently do 4 map-reduce jobs. > The plan can be optimized by running it in 3 map-reduce jobs, by spraying > over the > distinct column first and then aggregating individual results. > This may not be possible if there are multiple distinct columns, but the > above query is very common > in data warehousing environments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-609) optimize multi-group by
[ https://issues.apache.org/jira/browse/HIVE-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-609: Attachment: hive.609.1.patch > optimize multi-group by > > > Key: HIVE-609 > URL: https://issues.apache.org/jira/browse/HIVE-609 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.609.1.patch > > > For query like: > from src > insert overwrite table dest1 select col1, count(distinct colx) group by col1 > insert overwrite table dest2 select col2, count(distinct colx) group by col2; > If map side aggregation is turned off, we currently do 4 map-reduce jobs. > The plan can be optimized by running it in 3 map-reduce jobs, by spraying > over the > distinct column first and then aggregating individual results. > This may not be possible if there are multiple distinct columns, but the > above query is very common > in data warehousing environments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-405: Resolution: Fixed Fix Version/s: 0.4.0 Release Note: HIVE-405. Cleanup operator initialization. (Prasad Chakka via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Prasad. > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Fix For: 0.4.0 > > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.9.patch, > hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Chakka updated HIVE-405: --- Attachment: hive-405.9.patch > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.9.patch, > hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-645) A UDF that can export data to JDBC databases.
[ https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732309#action_12732309 ] Namit Jain commented on HIVE-645: - Actually, this is more of a user-defined procedure than a user-defined function. Should we support them in some other way than a select list item ? Something, like begin proc(); end; > A UDF that can export data to JDBC databases. > - > > Key: HIVE-645 > URL: https://issues.apache.org/jira/browse/HIVE-645 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Edward Capriolo >Assignee: Edward Capriolo >Priority: Minor > Attachments: hive-645-2.patch, hive-645.patch > > > A UDF that can export data to JDBC databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-645) A UDF that can export data to JDBC databases.
[ https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-645: - Release Note: Provides DBOutputUDF Status: Patch Available (was: Open) > A UDF that can export data to JDBC databases. > - > > Key: HIVE-645 > URL: https://issues.apache.org/jira/browse/HIVE-645 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Edward Capriolo >Assignee: Edward Capriolo >Priority: Minor > Attachments: hive-645-2.patch, hive-645.patch > > > A UDF that can export data to JDBC databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-645) A UDF that can export data to JDBC databases.
[ https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-645: - Attachment: hive-645-2.patch Better use of object inspectors. > A UDF that can export data to JDBC databases. > - > > Key: HIVE-645 > URL: https://issues.apache.org/jira/browse/HIVE-645 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Edward Capriolo >Assignee: Edward Capriolo >Priority: Minor > Attachments: hive-645-2.patch, hive-645.patch > > > A UDF that can export data to JDBC databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-646) UDFs for conversion between different number bases (conv, hex, bin)
[ https://issues.apache.org/jira/browse/HIVE-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-646: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Emil > UDFs for conversion between different number bases (conv, hex, bin) > --- > > Key: HIVE-646 > URL: https://issues.apache.org/jira/browse/HIVE-646 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Emil Ibrishimov >Assignee: Emil Ibrishimov > Fix For: 0.4.0 > > Attachments: HIVE-646.1.patch > > > Add conv, hex and bin UDFs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732276#action_12732276 ] Namit Jain commented on HIVE-405: - i dont like it - but it is a minor issue and can be done later so fine > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732268#action_12732268 ] Prasad Chakka commented on HIVE-405: for now i will leave the initializeChildren() inside initializeOp(). what do you guys say? > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732234#action_12732234 ] Zheng Shao commented on HIVE-405: - Talked with Prasad on this offline. Since we already have an example of initializing children before we can finish all the initializations (ScriptOperator kicking off the thread that gets data from the script), I think it makes sense to keep the "initializeChildren" call inside the customized initializeOp(). Another way to do this is to add a post-order intialize recursive call to the Operator. That is probably the cleanest approach - we will first do pre-order initialization, and then post-order initialization. > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-647) SORT BY with GROUP ignored without LIMIT
[ https://issues.apache.org/jira/browse/HIVE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732231#action_12732231 ] Ashish Thusoo commented on HIVE-647: Actually sort is supposed to be a local sort within a reduce instead of a global sort. It is usually used along with the distribute by to define the manner in which the keys are distributed to a reducer and sorted within a reducer. I believe that if you used order by instead of sort by we automatically select 1 reducer and do the sort. > SORT BY with GROUP ignored without LIMIT > > > Key: HIVE-647 > URL: https://issues.apache.org/jira/browse/HIVE-647 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Bill Graham > > For queries with GROUP BY and SORT BY, the sort is not handled properly when > a LIMIT is not supplied. If I run the following two queries, the first > returns properly sorted results. The second does not. > SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num > DESC LIMIT 50; > SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num > DESC; > Explain is different for the two queries as well. The first uses 3 M/R jobs > and the second only uses 2, which might be part of the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-644) change default size for merging files at the end of the job
[ https://issues.apache.org/jira/browse/HIVE-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-644: Resolution: Fixed Fix Version/s: 0.4.0 Release Note: HIVE-644. Change default size for merging files to 256MB. (Namit Jain via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Namit! > change default size for merging files at the end of the job > --- > > Key: HIVE-644 > URL: https://issues.apache.org/jira/browse/HIVE-644 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.4.0 > > Attachments: hive.644.1.patch > > > Currently, the size is 1G and the reducers end up taking a really long time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-648) Better info messages on the command line
Better info messages on the command line Key: HIVE-648 URL: https://issues.apache.org/jira/browse/HIVE-648 Project: Hadoop Hive Issue Type: Improvement Reporter: Zheng Shao We want to print out some better error messages on the hive command line. Currently we show the progress of each map-reduce jobs, but it's hard to see which map-reduce jobs we are currently running. We should show the information in a more structured way. This is dependent on another issue where we want to show the total number of tasks for each query (instead of just map-reduce jobs). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-647) SORT BY with GROUP ignored without LIMIT
[ https://issues.apache.org/jira/browse/HIVE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732176#action_12732176 ] Bill Graham commented on HIVE-647: -- Note that the second query does sort properly if I explicitly set the number of reducers to 1 with the following command. set mapred.reduce.tasks=1; > SORT BY with GROUP ignored without LIMIT > > > Key: HIVE-647 > URL: https://issues.apache.org/jira/browse/HIVE-647 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Bill Graham > > For queries with GROUP BY and SORT BY, the sort is not handled properly when > a LIMIT is not supplied. If I run the following two queries, the first > returns properly sorted results. The second does not. > SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num > DESC LIMIT 50; > SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num > DESC; > Explain is different for the two queries as well. The first uses 3 M/R jobs > and the second only uses 2, which might be part of the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-647) SORT BY with GROUP ignored without LIMIT
[ https://issues.apache.org/jira/browse/HIVE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham updated HIVE-647: - Summary: SORT BY with GROUP ignored without LIMIT (was: ORDER BY with GROUP ignored without LIMIT) > SORT BY with GROUP ignored without LIMIT > > > Key: HIVE-647 > URL: https://issues.apache.org/jira/browse/HIVE-647 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Bill Graham > > For queries with GROUP BY and SORT BY, the sort is not handled properly when > a LIMIT is not supplied. If I run the following two queries, the first > returns properly sorted results. The second does not. > SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num > DESC LIMIT 50; > SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num > DESC; > Explain is different for the two queries as well. The first uses 3 M/R jobs > and the second only uses 2, which might be part of the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-396) Hive performance benchmarks
[ https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuntao Jia updated HIVE-396: Attachment: (was: hive_benchmark_2009-07-12.pdf) > Hive performance benchmarks > --- > > Key: HIVE-396 > URL: https://issues.apache.org/jira/browse/HIVE-396 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Zheng Shao > Attachments: hive_benchmark_2009-06-18.pdf, > hive_benchmark_2009-06-18.tar.gz, hive_benchmark_2009-07-12.pdf, > hive_benchmark_2009-07-12.tar.gz > > > We need some performance benchmark to measure and track the performance > improvements of Hive. > Some references: > PIG performance benchmarks PIG-200 > PigMix: http://wiki.apache.org/pig/PigMix -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-396) Hive performance benchmarks
[ https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuntao Jia updated HIVE-396: Attachment: hive_benchmark_2009-07-12.pdf Revising the benchmark report. Thanks Raghu Murthy for his help. > Hive performance benchmarks > --- > > Key: HIVE-396 > URL: https://issues.apache.org/jira/browse/HIVE-396 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Zheng Shao > Attachments: hive_benchmark_2009-06-18.pdf, > hive_benchmark_2009-06-18.tar.gz, hive_benchmark_2009-07-12.pdf, > hive_benchmark_2009-07-12.tar.gz > > > We need some performance benchmark to measure and track the performance > improvements of Hive. > Some references: > PIG performance benchmarks PIG-200 > PigMix: http://wiki.apache.org/pig/PigMix -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-647) ORDER BY with GROUP ignored without LIMIT
ORDER BY with GROUP ignored without LIMIT - Key: HIVE-647 URL: https://issues.apache.org/jira/browse/HIVE-647 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Bill Graham For queries with GROUP BY and SORT BY, the sort is not handled properly when a LIMIT is not supplied. If I run the following two queries, the first returns properly sorted results. The second does not. SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num DESC LIMIT 50; SELECT user, SUM(numRequests) AS num FROM MyTable GROUP BY user SORT BY num DESC; Explain is different for the two queries as well. The first uses 3 M/R jobs and the second only uses 2, which might be part of the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time
[ https://issues.apache.org/jira/browse/HIVE-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732120#action_12732120 ] Alex Loddengaard commented on HIVE-425: --- I've used both Struts and Spring, but I'm pretty certain both are much larger than 5MB. That said, I found each of them to be good in their own separate ways. I recall liking Spring better. It's been a long time since my JSP MVC fiddling :). Struts is an Apache project, though: [http://struts.apache.org/]. Sorry I can't provide more insight, Edward. > HWI JSP pages should be compiled at build-time instead of run-time > -- > > Key: HIVE-425 > URL: https://issues.apache.org/jira/browse/HIVE-425 > Project: Hadoop Hive > Issue Type: Improvement > Components: Web UI >Reporter: Alex Loddengaard > > HWI JSP pages are compiled via the ant jar at run-time. Doing so at run-time > requires ant as a dependency and also makes developing slightly more tricky, > as compiler errors are not discovered until HWI is deployed and running. HWI > should be instrumented in such a way where the JSP pages are compiled by ant > at build-time instead, just as the Hadoop status pages are. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time
[ https://issues.apache.org/jira/browse/HIVE-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732112#action_12732112 ] Edward Capriolo commented on HIVE-425: -- Alex, Here are the current problems I see. * JSP ** Hard To check Syntax at Compile Time ** References to other parts of hive (not many but) may become outdated with API changes * Hard to unit test Are unit test tests the SessionManager, not the actual jsp flow of passing form items. So the ideal framework would be: * small <5MB. we dont want to make Hive bigger just better * mostly .Java and servlet based (little/no JSP) * Possibly a way to unit test a transaction. Login->RunQuery->Verify Results * Apache compat license * XML/CSS (if it helps eliminate JSP) * Low complexity (I am not interested in learning what a org.springframework.transaction.interceptor.AbstractFallbackTransactionAttributeSource ) is for example :) I know little about JSP Frameworks. Do you you have one that fits the needs? > HWI JSP pages should be compiled at build-time instead of run-time > -- > > Key: HIVE-425 > URL: https://issues.apache.org/jira/browse/HIVE-425 > Project: Hadoop Hive > Issue Type: Improvement > Components: Web UI >Reporter: Alex Loddengaard > > HWI JSP pages are compiled via the ant jar at run-time. Doing so at run-time > requires ant as a dependency and also makes developing slightly more tricky, > as compiler errors are not discovered until HWI is deployed and running. HWI > should be instrumented in such a way where the JSP pages are compiled by ant > at build-time instead, just as the Hadoop status pages are. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732057#action_12732057 ] Namit Jain commented on HIVE-405: - >> Are you suggesting that we keep initializeChildren() method and override in >> MapOperator()? yes Yes, MapOperator() becomes more customized, but all other operators become simplified initializeOp() just calls operator specific initialization, if any. By default, all children are initialized after that, with the outputObjectInspector (assuming there is only one). MapOperator overrides this behavior -- If we want to make it more general, we can have an array of outputObjectInspectors, one for each child - they will be the same except for MapOperator, but then mapOperator can also fit in this framework - I dont think it is worth it to generalize at that level. > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732053#action_12732053 ] Prasad Chakka commented on HIVE-405: Are you suggesting that we keep initializeChildren() method and override in MapOperator()? I think the current initialization methods has distinct functionalities 1) Operator.initialize() -- makes sure that all parents are initialized before the operator is initialized. this also initializes common structures needed for all operators. this the only public initialization method. 2) Operator.initializeOp() -- does operator specific initialization including initialization of children. It is up to the operator in what order child operators need to be initialized. the base implementation will just call initializeChildren() with the output ObjectInspector. this is a protected method. 3) Operator.initializeChildren() -- calls initialize() on all children. this is a protected method. I think what we have here and you are proposing are pretty similar except that MapOperator() become more customized. But I agree that there should be an OutputObjectInspector field in Operator.java and that should be used while calling initialize() on children in Operator.java. I will make that change. > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time
[ https://issues.apache.org/jira/browse/HIVE-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732039#action_12732039 ] Alex Loddengaard commented on HIVE-425: --- Hi Edward, which MVC frameworks are you going to consider? I assume Struts and Spring don't fall into the "light" category, yeah? I'm mostly just curious. > HWI JSP pages should be compiled at build-time instead of run-time > -- > > Key: HIVE-425 > URL: https://issues.apache.org/jira/browse/HIVE-425 > Project: Hadoop Hive > Issue Type: Improvement > Components: Web UI >Reporter: Alex Loddengaard > > HWI JSP pages are compiled via the ant jar at run-time. Doing so at run-time > requires ant as a dependency and also makes developing slightly more tricky, > as compiler errors are not discovered until HWI is deployed and running. HWI > should be instrumented in such a way where the JSP pages are compiled by ant > at build-time instead, just as the Hadoop status pages are. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732040#action_12732040 ] Namit Jain commented on HIVE-405: - As you said 2. does not matter - I dont think if we move initializeChildren() to the end, anything will change 1. is needed - MapOperator is an exception, and it needs a different walker - it can override initializeChildren() to do nothing. In some sense, it has more than one outputObjectInspectors, which is not true for any other operator - ExecMapper and ExecReducer are different. I still think we should do 1. - and treat MapOperator as a exception > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-646) UDFs for conversion between different number bases (conv, hex, bin)
[ https://issues.apache.org/jira/browse/HIVE-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emil Ibrishimov updated HIVE-646: - Fix Version/s: 0.4.0 Status: Patch Available (was: Open) > UDFs for conversion between different number bases (conv, hex, bin) > --- > > Key: HIVE-646 > URL: https://issues.apache.org/jira/browse/HIVE-646 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Emil Ibrishimov >Assignee: Emil Ibrishimov > Fix For: 0.4.0 > > Attachments: HIVE-646.1.patch > > > Add conv, hex and bin UDFs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-646) UDFs for conversion between different number bases (conv, hex, bin)
[ https://issues.apache.org/jira/browse/HIVE-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emil Ibrishimov updated HIVE-646: - Attachment: HIVE-646.1.patch Does anyone know a faster way to check for overflow and for unsigned division in java (than the one implemented in conv)? > UDFs for conversion between different number bases (conv, hex, bin) > --- > > Key: HIVE-646 > URL: https://issues.apache.org/jira/browse/HIVE-646 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Emil Ibrishimov >Assignee: Emil Ibrishimov > Fix For: 0.4.0 > > Attachments: HIVE-646.1.patch > > > Add conv, hex and bin UDFs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-646) UDFs for conversion between different number bases (conv, hex, bin)
UDFs for conversion between different number bases (conv, hex, bin) --- Key: HIVE-646 URL: https://issues.apache.org/jira/browse/HIVE-646 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Emil Ibrishimov Assignee: Emil Ibrishimov Add conv, hex and bin UDFs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732029#action_12732029 ] Prasad Chakka commented on HIVE-405: i thought about it but didn't do it for two reasons 1) any operator can choose not to call initializeChildren. eg. MapOperator does not call initializeChildren() directly but calls initialize() on children directly with different ObjectInspector object for each child. 2) ScriptOperator.initializeOp() seems to do some more stuff after calling initializeChildren() which is different from calling initiaizeChildren() after calling initializeOp() Even if 2) doesn't matter, I don't see how we can get around 1) in the scheme you suggested If you agree then I am going to upload a new patch which fixes TestOperators (which was failing after merging with trunk). > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732025#action_12732025 ] Namit Jain commented on HIVE-405: - Other than that, it looks good, you dont need to set state = State.INIT in initializeChildren() the operator.initialize() will look like: ... initializeOp() state = INIT initializeChildren() .. initializechildren() can be a private method which no one else should call > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732019#action_12732019 ] Namit Jain commented on HIVE-405: - Thinking about it more, you should add a outputObjectInspector to Operator, which can be defaulted to inputObjectInspector[0], but can be overridden by specific initializeOp() - for example, groupBy. This way, initializeChildren() in Operator need not be called with inputObjectInspector[0], but it can be called with outputObjectInspector > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732016#action_12732016 ] Namit Jain commented on HIVE-405: - Does initializeOp() even need to call initializeChildren() ? This can be removed from all implementations of initializeOp() and can be added to Operator.initialize() after initializeOp() > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-645) A UDF that can export data to JDBC databases.
[ https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-645: - Attachment: hive-645.patch First Draft patch for comments. > A UDF that can export data to JDBC databases. > - > > Key: HIVE-645 > URL: https://issues.apache.org/jira/browse/HIVE-645 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Edward Capriolo >Assignee: Edward Capriolo >Priority: Minor > Attachments: hive-645.patch > > > A UDF that can export data to JDBC databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-645) A UDF that can export data to JDBC databases.
A UDF that can export data to JDBC databases. - Key: HIVE-645 URL: https://issues.apache.org/jira/browse/HIVE-645 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor A UDF that can export data to JDBC databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Chakka updated HIVE-405: --- Attachment: hive-405.7.patch resolved conflicts with GroupByOperator.java and UnionOperator.java. I am testing the later conflict but uploaded latest patch. > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Chakka updated HIVE-405: --- Attachment: (was: hive-405.7.patch) > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Chakka updated HIVE-405: --- Attachment: hive-405.7.patch > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.7.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time
[ https://issues.apache.org/jira/browse/HIVE-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731990#action_12731990 ] Edward Capriolo commented on HIVE-425: -- I have tried several things to make this process work. Without siting any one specific error it seems that the 'org.apache.jasper.JspC' compilation seems to require specific ant-jetty combination. I want to look into a light mvc framework something that will get all the code into servlets/java classes. In this way it will be pre-compiled. > HWI JSP pages should be compiled at build-time instead of run-time > -- > > Key: HIVE-425 > URL: https://issues.apache.org/jira/browse/HIVE-425 > Project: Hadoop Hive > Issue Type: Improvement > Components: Web UI >Reporter: Alex Loddengaard > > HWI JSP pages are compiled via the ant jar at run-time. Doing so at run-time > requires ant as a dependency and also makes developing slightly more tricky, > as compiler errors are not discovered until HWI is deployed and running. HWI > should be instrumented in such a way where the JSP pages are compiled by ant > at build-time instead, just as the Hadoop status pages are. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Hive-trunk-h0.18 #158
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/158/changes
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731970#action_12731970 ] Namit Jain commented on HIVE-405: - Can you regenerate the patch - there are some conflicts > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Error on Load into multiple Partitions
Namit. I just Updated to revision 794686 and that worked. It looks like Zheng committed this patch in the afternoon and this failed for me earlier that morning. Bad luck on my timing but I'm happy it works now. Thanks. -Matt On Thu, Jul 16, 2009 at 10:09 AM, Namit Jain wrote: > Most probably, this is the same as > > https://issues.apache.org/jira/browse/HIVE-636 > > which was merged just a days back. Can you try on the latest trunk ? > > > > > On 7/16/09 6:45 AM, "Matt Pestritto" wrote: > > Does anyone have any idea as to the reason for this error ? > > Thanks in Advance > -Matt > > -- Forwarded message -- > From: Matt Pestritto > Date: Wed, Jul 15, 2009 at 10:09 AM > Subject: Error on Load into multiple Partitions > To: hive-dev@hadoop.apache.org > > > Hi All. > > Are there are existing test cases that load into multiple partitions using > a > single from query? This query worked in an older revision but the mappers > fails when I run on trunk: > > java.lang.RuntimeException: Map operator initialization failed > >at > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) >at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) >at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) >at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) > > Caused by: java.lang.NullPointerException >at > org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176) >at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204) > >at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264) >at > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103) > > > Here is a simplified version of what I'm running and DDL to support: > *create table test_m ( client int, description string ) > row format delimited fields terminated by '\011' lines terminated by > '\012' stored as textfile; > * > *create table test_m_p ( description string ) > partitioned by ( client int ) row format delimited fields terminated by > '\011' lines terminated by '\012' stored as textfile; > * > *LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m ; * > > *FROM test_m > INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description > where client=1 > INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description > where client=2 ; > * > --- contents of /tmp/m.lst > 1test > 1test2 > 1test3 > 2hi > 2hi1 > 2hi3 > > Thanks! > -Matt > >
[jira] Commented: (HIVE-405) Cleanup operator initialization
[ https://issues.apache.org/jira/browse/HIVE-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731954#action_12731954 ] Namit Jain commented on HIVE-405: - I will take a look and get back to you. > Cleanup operator initialization > --- > > Key: HIVE-405 > URL: https://issues.apache.org/jira/browse/HIVE-405 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Prasad Chakka >Priority: Critical > Attachments: hive-405.6.patch, hive-405.patch > > > We are always passing the same ObjectInspector, so there is no need to pass > it again and again in forward. > Also there is a problem that can ONLY be fixed by passing ObjectInspector in > init: Outer Joins - Outer Joins may not be able to get ObjectInspectors for > all inputs, as a result, there is no way to construct an output > ObjectInspector based on the inputs. Currently we have hard-coded code that > assumes joins are always outputting Strings, which did break but was hidden > by the old framework (because we do toString() when serializing the output, > and toString() is defined for all Java Classes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-604) remove Schema from metastore
[ https://issues.apache.org/jira/browse/HIVE-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reassigned HIVE-604: --- Assignee: Min Zhou (was: Namit Jain) > remove Schema from metastore > > > Key: HIVE-604 > URL: https://issues.apache.org/jira/browse/HIVE-604 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Min Zhou > Fix For: 0.4.0 > > > Since Schema is not used by metastore, remove it from there -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-576) complete jdbc driver
[ https://issues.apache.org/jira/browse/HIVE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731952#action_12731952 ] Namit Jain commented on HIVE-576: - Now that, hive is using asf thrift, you can start with modifying getSchema() - remove the new object Schema() from the metastore. The hive client can return hive types instead of thrift types. We can more discussion on https://issues.apache.org/jira/browse/HIVE-604 I will assign that to you > complete jdbc driver > > > Key: HIVE-576 > URL: https://issues.apache.org/jira/browse/HIVE-576 > Project: Hadoop Hive > Issue Type: Improvement >Affects Versions: 0.4.0 >Reporter: Min Zhou >Assignee: Min Zhou > Fix For: 0.4.0 > > Attachments: HIVE-576.1.patch, HIVE-576.2.patch, sqlexplorer.jpg > > > hive only support a few interfaces of jdbc, let's complete it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-541) Implement UDFs: INSTR and LOCATE
[ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-541. - Resolution: Fixed Hadoop Flags: [Reviewed] Committed. Thanks Min > Implement UDFs: INSTR and LOCATE > > > Key: HIVE-541 > URL: https://issues.apache.org/jira/browse/HIVE-541 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Min Zhou > Attachments: HIVE-541.1.patch, HIVE-541.2.patch > > > http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr > http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate > These functions can be directly implemented with Text (instead of String). > This will make the test of whether one string contains another string much > faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Error on Load into multiple Partitions
Most probably, this is the same as https://issues.apache.org/jira/browse/HIVE-636 which was merged just a days back. Can you try on the latest trunk ? On 7/16/09 6:45 AM, "Matt Pestritto" wrote: Does anyone have any idea as to the reason for this error ? Thanks in Advance -Matt -- Forwarded message -- From: Matt Pestritto Date: Wed, Jul 15, 2009 at 10:09 AM Subject: Error on Load into multiple Partitions To: hive-dev@hadoop.apache.org Hi All. Are there are existing test cases that load into multiple partitions using a single from query? This query worked in an older revision but the mappers fails when I run on trunk: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103) Here is a simplified version of what I'm running and DDL to support: *create table test_m ( client int, description string ) row format delimited fields terminated by '\011' lines terminated by '\012' stored as textfile; * *create table test_m_p ( description string ) partitioned by ( client int ) row format delimited fields terminated by '\011' lines terminated by '\012' stored as textfile; * *LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m ; * *FROM test_m INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description where client=1 INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description where client=2 ; * --- contents of /tmp/m.lst 1test 1test2 1test3 2hi 2hi1 2hi3 Thanks! -Matt
Hudson build is back to normal: Hive-trunk-h0.17 #156
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/156/changes
Fwd: Error on Load into multiple Partitions
Does anyone have any idea as to the reason for this error ? Thanks in Advance -Matt -- Forwarded message -- From: Matt Pestritto Date: Wed, Jul 15, 2009 at 10:09 AM Subject: Error on Load into multiple Partitions To: hive-dev@hadoop.apache.org Hi All. Are there are existing test cases that load into multiple partitions using a single from query? This query worked in an older revision but the mappers fails when I run on trunk: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:176) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:204) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:264) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:103) Here is a simplified version of what I'm running and DDL to support: *create table test_m ( client int, description string ) row format delimited fields terminated by '\011' lines terminated by '\012' stored as textfile; * *create table test_m_p ( description string ) partitioned by ( client int ) row format delimited fields terminated by '\011' lines terminated by '\012' stored as textfile; * *LOAD DATA LOCAL INPATH '/tmp/m.lst' OVERWRITE INTO TABLE test_m ; * *FROM test_m INSERT OVERWRITE TABLE test_m_p PARTITION ( client=1 ) select description where client=1 INSERT OVERWRITE TABLE test_m_p PARTITION ( client=2 ) select description where client=2 ; * --- contents of /tmp/m.lst 1test 1test2 1test3 2hi 2hi1 2hi3 Thanks! -Matt
[jira] Resolved: (HIVE-515) [UDF] new string function INSTR(str,substr)
[ https://issues.apache.org/jira/browse/HIVE-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou resolved HIVE-515. --- Resolution: Duplicate duplicates [#HIVE-541] > [UDF] new string function INSTR(str,substr) > --- > > Key: HIVE-515 > URL: https://issues.apache.org/jira/browse/HIVE-515 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Min Zhou >Assignee: Min Zhou > Attachments: HIVE-515-2.patch, HIVE-515.patch > > > UDF for string function INSTR(str,substr) > This extends the function from MySQL > http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_instr > usage: > INSTR(str, substr) > INSTR(str, substr, start) > example: > {code:sql} > select instr('abcd', 'abc') from pokes; // all result are '1' > select instr('abcabc', 'ccc') from pokes; // all result are '0' > select instr('abcabc', 'abc', 2) from pokes; // all result are '4' > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE
[ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731858#action_12731858 ] Min Zhou commented on HIVE-541: --- all test cases passed on my side, how's yours? > Implement UDFs: INSTR and LOCATE > > > Key: HIVE-541 > URL: https://issues.apache.org/jira/browse/HIVE-541 > Project: Hadoop Hive > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Min Zhou > Attachments: HIVE-541.1.patch, HIVE-541.2.patch > > > http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr > http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate > These functions can be directly implemented with Text (instead of String). > This will make the test of whether one string contains another string much > faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.