[jira] Updated: (HIVE-1524) parallel execution failed if mapred.job.name is set

2010-09-27 Thread yourchanges (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yourchanges updated HIVE-1524:
--

Attachment: HIVE-1524-for-Hive-0.6.patch

HIVE-1524 backporting to HIVE 0.6 branch

> parallel execution failed if mapred.job.name is set
> ---
>
> Key: HIVE-1524
> URL: https://issues.apache.org/jira/browse/HIVE-1524
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1524-for-Hive-0.6.patch, HIVE-1524.2.patch, 
> HIVE-1524.patch
>
>
> The plan file name was generated based on mapred.job.name. If the user 
> specify mapred.job.name before the query, two parallel queries will have 
> conflict plan file name. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-09-27 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915626#action_12915626
 ] 

Liyin Tang commented on HIVE-1642:
--

Are we talking about dynamically changing execution plan based on the table 
size?

> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915619#action_12915619
 ] 

Namit Jain commented on HIVE-1157:
--

The changes looked good, but I got the following error:

[junit] Begin query: alter1.q
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit2/hive_commit2/build/ql/test/logs/clientpositive/alter1.q.out
 
/data/users/njain/hive_commit2/hive_commit2/ql/src/test/results/clientpositive/alter1.q.out
[junit] 778d777
[junit] < Resource ../data/files/TestSerDe.jar already added.


Philip, can you take care of that ?

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1671.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Bennie

> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1667) Store the group of the owner of the table in metastore

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915516#action_12915516
 ] 

Namit Jain commented on HIVE-1667:
--

Is it true that the first group is always the primary group (like Unix) ?
If org.apache.hadoop.security.UserGroupInformation guarantees that, your 
approach seems correct.

One more thing to check is what happens in case of creation of a partition via 
insert overwrite ?
The temp. folder is moved to the final folder, so you may have to fix there 
also.

> Store the group of the owner of the table in metastore
> --
>
> Key: HIVE-1667
> URL: https://issues.apache.org/jira/browse/HIVE-1667
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Namit Jain
> Attachments: hive-1667.patch
>
>
> Currently, the group of the owner of the table is not stored in the metastore.
> Secondly, if you create a table, the table's owner group is set to the group 
> for the parent. It is not read from the UGI passed in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915508#action_12915508
 ] 

Namit Jain commented on HIVE-1671:
--

OK, I can now see the problem.

+1


> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915469#action_12915469
 ] 

Namit Jain commented on HIVE-1157:
--

This is good to have - I will take a look

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915468#action_12915468
 ] 

Bennie Schut commented on HIVE-1671:


Sorry I was a bit short on the description. I'm running the HiveServer with 
hive.exec.parallel set to true. I'm running many jobs each day for about a week 
after startup. Then I notice 2 threads are stuck at 100% cpu for about 3days. I 
used jstack to look at both threads and they showed the same stacktrace.

> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1606) For a null value in a string column, JDBC driver returns the string "NULL"

2010-09-27 Thread Steven Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Wong resolved HIVE-1606.
---

Fix Version/s: 0.7.0
   Resolution: Fixed

Fixed by HIVE-1378.

> For a null value in a string column, JDBC driver returns the string "NULL"
> --
>
> Key: HIVE-1606
> URL: https://issues.apache.org/jira/browse/HIVE-1606
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Drivers
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>Assignee: Steven Wong
> Fix For: 0.7.0
>
>
> It should return null instead of "NULL".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-27 Thread Steven Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915452#action_12915452
 ] 

Steven Wong commented on HIVE-1378:
---

Thanks, Ning/Zheng/John!

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, 
> HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915440#action_12915440
 ] 

Namit Jain commented on HIVE-1671:
--

Are your using HiveServer ?

.bq we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in 
which case, I can see the problem happening ?

> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915440#action_12915440
 ] 

Namit Jain edited comment on HIVE-1671 at 9/27/10 3:22 PM:
---

Are your using HiveServer ?

>> we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in 
which case, I can see the problem happening ?

  was (Author: namit):
Are your using HiveServer ?

.bq we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in 
which case, I can see the problem happening ?
  
> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1378) Return value for map, array, and struct needs to return a string

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1378:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed. Thanks Steven!

> Return value for map, array, and struct needs to return a string 
> -
>
> Key: HIVE-1378
> URL: https://issues.apache.org/jira/browse/HIVE-1378
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Drivers
>Reporter: Jerome Boulon
>Assignee: Steven Wong
> Fix For: 0.7.0
>
> Attachments: HIVE-1378.1.patch, HIVE-1378.2.patch, HIVE-1378.3.patch, 
> HIVE-1378.4.patch, HIVE-1378.5.patch, HIVE-1378.6.patch, HIVE-1378.7.patch, 
> HIVE-1378.patch
>
>
> In order to be able to select/display any data from JDBC Hive driver, return 
> value for map, array, and struct needs to return a string

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1669) non-deterministic display of storage parameter in test

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1669:
-

Parent: HIVE-1658
Issue Type: Sub-task  (was: Test)

> non-deterministic display of storage parameter in test
> --
>
> Key: HIVE-1669
> URL: https://issues.apache.org/jira/browse/HIVE-1669
> Project: Hadoop Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>
> With the change to beautify the 'desc extended table', the storage parameters 
> are displayed in non-deterministic manner (since its implementation is 
> HashMap). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-27 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915437#action_12915437
 ] 

Ning Zhang commented on HIVE-1658:
--

Another issue is that now 'desc extended' displays table/partition parameters 
in different lines. Since parameters is using a unordered map implementation, 
it will give non-deterministic display of those parameters. It will be great if 
the pretty operator will take care of ordering as well.

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1663) ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty

2010-09-27 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915435#action_12915435
 ] 

He Yongqiang commented on HIVE-1663:


Sorry that I committed this myself. i can not generate a patch for it (this 
file is empty). 

> ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty
> --
>
> Key: HIVE-1663
> URL: https://issues.apache.org/jira/browse/HIVE-1663
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
>
> we should remove this empty file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1663) ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty

2010-09-27 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1663.


Fix Version/s: 0.7.0
   Resolution: Fixed

fixed. 

> ql/src/java/org/apache/hadoop/hive/ql/parse/SamplePruner.java is empty
> --
>
> Key: HIVE-1663
> URL: https://issues.apache.org/jira/browse/HIVE-1663
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
>
> we should remove this empty file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-27 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1361:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed! Thanks Ning and Ahmed!

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.4.java_only.patch, HIVE-1361.4.patch, 
> HIVE-1361.5.java_only.patch, HIVE-1361.5.patch, HIVE-1361.java_only.patch, 
> HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Logging level

2010-09-27 Thread Steven Wong
Hmm, it doesn't seem possible before MAPREDUCE-336 is fixed.


-Original Message-
From: Steven Wong [mailto:sw...@netflix.com] 
Sent: Monday, September 27, 2010 10:31 AM
To: hive-u...@hadoop.apache.org
Cc: 
Subject: RE: Logging level

Yes, I need a way to change the logging level of map/reduce tasks for the jobs 
of my Hive query, without affecting the logging level of other jobs or Hadoop 
daemons. How do I set hadoop.root.logger to do that?


-Original Message-
From: Shrijeet Paliwal [mailto:shrij...@rocketfuel.com] 
Sent: Sunday, September 26, 2010 1:51 AM
To: hive-u...@hadoop.apache.org
Cc: 
Subject: Re: Logging level

You need to set hadoop looging level to debug if you are looking at
map task logs. I guess hadoop.root.logger is your friend.

On Sun, Sep 26, 2010 at 12:06 AM, Steven Wong  wrote:
>
> Tried your suggestion, but it still logs at INFO level in 
> /mnt/var/log/hadoop/userlogs/attempt_201008140123_5752_m_18_0/syslog. Am 
> I looking in the wrong file?
>
>
>
>
>
> From: Ning Zhang [mailto:nzh...@facebook.com]
> Sent: Saturday, September 25, 2010 11:45 PM
> To: 
> Cc: 
> Subject: Re: Logging level
>
>
>
> hive -hiveconf hive.root.logger=DEBUG,DRFA
>
>
>
> On Sep 25, 2010, at 11:27 PM, Steven Wong wrote:
>
> How can I control the logging level of Hive code that runs in the 
> mappers/reducers? For example, how to set it to DEBUG?
>
>
>
> Thanks.
>
> Steven
>
>
>
>




[jira] Commented: (HIVE-1624) Patch to allows scripts in S3 location

2010-09-27 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915391#action_12915391
 ] 

He Yongqiang commented on HIVE-1624:


Great. some nitpick, sorry for not posting them in the previous comment. 
1) It seems there is still one logging code "+
getConsole().printInfo("Testing " + value);"
2) Also can you add one junit test for DosToUnix?
3) Do you think it maybe better to move fetchFilesNotInLocalFilesystem to 
SessionState?

> Patch to allows scripts in S3 location
> --
>
> Key: HIVE-1624
> URL: https://issues.apache.org/jira/browse/HIVE-1624
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1624-2.patch, HIVE-1624-3.patch, HIVE-1624.patch
>
>
> I want to submit a patch which allows user to run scripts located in S3.
> This patch enables Hive to download the hive scripts located in S3 buckets 
> and execute them. This saves users the effort of copying scripts to HDFS 
> before executing them.
> Thanks
> Vaibhav

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1157:
-

Status: Patch Available  (was: Open)

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Review Request: HIVE-1157: UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/910/
---

Review request for Hive Developers.


Summary
---

UDFs can't be loaded via "add jar" when jar is on HDFS


This addresses bug HIVE-1157.
http://issues.apache.org/jira/browse/HIVE-1157


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java b5cd1ef 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java a36b06e 
  ql/src/test/org/apache/hadoop/hive/ql/session/TestAddJarFromHDFS.java 
PRE-CREATION 

Diff: http://review.cloudera.org/r/910/diff


Testing
---


Thanks,

Carl



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1157:
-

Attachment: HIVE-1157.patch.v5.txt

Attaching an updated version of Phil's patch that applies cleanly with -p0

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

2010-09-27 Thread Vaibhav Aggarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915383#action_12915383
 ] 

Vaibhav Aggarwal commented on HIVE-1620:


Hey Namit

Please let me know what you think of this.

Thanks
Vaibhav

> Patch to write directly to S3 from Hive
> ---
>
> Key: HIVE-1620
> URL: https://issues.apache.org/jira/browse/HIVE-1620
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Logging level

2010-09-27 Thread Steven Wong
Yes, I need a way to change the logging level of map/reduce tasks for the jobs 
of my Hive query, without affecting the logging level of other jobs or Hadoop 
daemons. How do I set hadoop.root.logger to do that?


-Original Message-
From: Shrijeet Paliwal [mailto:shrij...@rocketfuel.com] 
Sent: Sunday, September 26, 2010 1:51 AM
To: hive-u...@hadoop.apache.org
Cc: 
Subject: Re: Logging level

You need to set hadoop looging level to debug if you are looking at
map task logs. I guess hadoop.root.logger is your friend.

On Sun, Sep 26, 2010 at 12:06 AM, Steven Wong  wrote:
>
> Tried your suggestion, but it still logs at INFO level in 
> /mnt/var/log/hadoop/userlogs/attempt_201008140123_5752_m_18_0/syslog. Am 
> I looking in the wrong file?
>
>
>
>
>
> From: Ning Zhang [mailto:nzh...@facebook.com]
> Sent: Saturday, September 25, 2010 11:45 PM
> To: 
> Cc: 
> Subject: Re: Logging level
>
>
>
> hive -hiveconf hive.root.logger=DEBUG,DRFA
>
>
>
> On Sep 25, 2010, at 11:27 PM, Steven Wong wrote:
>
> How can I control the logging level of Hive code that runs in the 
> mappers/reducers? For example, how to set it to DEBUG?
>
>
>
> Thanks.
>
> Steven
>
>
>
>



[jira] Updated: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HIVE-1157:
--

Attachment: HIVE-1157.patch.v4.txt

Carl,

I updated the patch to current trunk.

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.v2.patch.txt, output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1582) merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'

2010-09-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1582.
--

Resolution: Not A Problem

Taked to Namit and Yongqiang, this is not a bug. INSERT OVERWRITE to (HDFS) 
directory should be merged as before. INSERT OVERWRITE LOCAL DIRECTORY cannot 
be merged and this is not the case. 

> merge mapfiles task behaves incorrectly for 'inserting overwrite directory...'
> --
>
> Key: HIVE-1582
> URL: https://issues.apache.org/jira/browse/HIVE-1582
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> hive> 
> > 
> > 
> >  SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
> hive>SET hive.exec.compress.output=false;
> hive>INSERT OVERWRITE DIRECTORY 'x'
> >  SELECT  from  a;
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks is set to 0 since there's no reduce operator
> ..
> Ended Job = job_201008191557_54169
> Ended Job = 450290112, job is filtered out (removed at runtime).
> Launching Job 2 out of 2
> .
> the second job should not get started.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #552

2010-09-27 Thread Apache Hudson Server
See 

--
[...truncated 12240 lines...]
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit

Build failed in Hudson: Hive-trunk-h0.18 #552

2010-09-27 Thread Apache Hudson Server
See 

--
[...truncated 30335 lines...]
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 

[junit] Loading data to table src1
[junit

[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915275#action_12915275
 ] 

HBase Review Board commented on HIVE-1671:
--

Message from: "Bennie Schut" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/909/
---

Review request for Hive Developers.


Summary
---

simple change HashMap into ConcurrentHashMap


This addresses bug HIVE-1671.
http://issues.apache.org/jira/browse/HIVE-1671


Diffs
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1001658 

Diff: http://review.cloudera.org/r/909/diff


Testing
---


Thanks,

Bennie




> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-1671:
---

Attachment: HIVE-1671-1.patch

No tests are added. It would be difficult to time a test to reproduce this and 
then it would show a normal HashMap isn't thread safe which we already know.

> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut reassigned HIVE-1671:
--

Assignee: Bennie Schut

> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915262#action_12915262
 ] 

Bennie Schut commented on HIVE-1671:


Perhaps because we now have multiple sub queries running in hive for the same 
overall query we can have concurrent use of this map?
We could simply fix this by using the ConcurrentHashMap


  private Map pathToCS = new ConcurrentHashMap();


> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
> Fix For: 0.7.0
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Bennie Schut (JIRA)
multithreading on Context.pathToCS
--

 Key: HIVE-1671
 URL: https://issues.apache.org/jira/browse/HIVE-1671
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Bennie Schut
 Fix For: 0.7.0


we having 2 threads running at 100%

With a stacktrace like this:

"Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
[0x442eb000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap.get(HashMap.java:303)
at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
at 
org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
at 
org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.