Re: Is anybody working on the globally "order by" of hive ?

2010-06-11 Thread Jeff Zhang
Great, I can work on this issue.




On Sat, Jun 12, 2010 at 2:02 PM, Jeff Hammerbacher  wrote:
> See https://issues.apache.org/jira/browse/HIVE-1402.
>
> On Fri, Jun 11, 2010 at 1:22 PM, John Sichi  wrote:
>
>> If someone is interested in adding parallel ORDER BY to Hive (using
>> TotalOrderPartitioner), here's a good starting point:
>>
>> http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
>>
>> The goal would be to take that manual two-step sample-then-sort process and
>> turn it into an automatic plan within Hive.  I have a better example for the
>> sampling query which I haven't published yet.
>>
>> We would also need to name the final output files in such a way that the
>> total order could be iterated via the filenames.
>>
>



-- 
Best Regards

Jeff Zhang


[jira] Commented: (HIVE-1402) Add parallel ORDER BY to Hive

2010-06-11 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878213#action_12878213
 ] 

Jeff Hammerbacher commented on HIVE-1402:
-

>From Ning Zhang:

bq. order by is supported in trunk with certain limititions in strict mode (has 
to have a limit)

>From John Sichi:

> If someone is interested in adding parallel ORDER BY to Hive (using 
> TotalOrderPartitioner), here's a good starting point: 
> http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
> 
> The goal would be to take that manual two-step sample-then-sort process and 
> turn it into an automatic plan within Hive.  I have a better example for the 
> sampling query which I > haven't published yet.
> 
> We would also need to name the final output files in such a way that the 
> total order could be iterated via the filenames.

> Add parallel ORDER BY to Hive
> -
>
> Key: HIVE-1402
> URL: https://issues.apache.org/jira/browse/HIVE-1402
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jeff Hammerbacher
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Is anybody working on the globally "order by" of hive ?

2010-06-11 Thread Jeff Hammerbacher
See https://issues.apache.org/jira/browse/HIVE-1402.

On Fri, Jun 11, 2010 at 1:22 PM, John Sichi  wrote:

> If someone is interested in adding parallel ORDER BY to Hive (using
> TotalOrderPartitioner), here's a good starting point:
>
> http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad
>
> The goal would be to take that manual two-step sample-then-sort process and
> turn it into an automatic plan within Hive.  I have a better example for the
> sampling query which I haven't published yet.
>
> We would also need to name the final output files in such a way that the
> total order could be iterated via the filenames.
>


[jira] Created: (HIVE-1402) Add parallel ORDER BY to Hive

2010-06-11 Thread Jeff Hammerbacher (JIRA)
Add parallel ORDER BY to Hive
-

 Key: HIVE-1402
 URL: https://issues.apache.org/jira/browse/HIVE-1402
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Jeff Hammerbacher




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: how to set the debut parameters of hive?

2010-06-11 Thread Ashish Thusoo
I think if you just pass the java parameters on the command line it should just 
work. So bin/hive  And your parameters. I have not tried it though, mostly 
I am just able to debug using eclipse (you can create the related eclipse files 
by doing

cd metastore
ant model-jar
cd ..
ant eclipse-files 

Ashish

-Original Message-
From: Zhou Shuaifeng [mailto:zhoushuaif...@huawei.com] 
Sent: Friday, June 11, 2010 12:00 AM
To: hive-dev@hadoop.apache.org
Cc: ac.pi...@huawei.com
Subject: how to set the debut parameters of hive?

Hi, I want to debug hive remotely, how to set the config?
E.g. debug hdfs is seeting DEBUG_PARAMETERS in the file 'bin/hadoop', so, how 
to set the debug parameters of hive?
Thanks a lot.



-
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by phone or email immediately and delete it!

 


[jira] Updated: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-11 Thread Mayank Lahiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Lahiri updated HIVE-1397:


Attachment: Histogram_quality.png.jpg

Since there is no approximation guarantee with this streaming algorithm, I ran 
a pretty comprehensive set of experiments to determine what the quality of the 
computed histogram is in terms of Mean-Squared-Error, and using R's hist() 
function as a gold standard.

I simulated a variety of conditions on the data and the Map/Reduce setup. To 
summarize, I ran multiple repetitions of histogram-building using the UDAF and 
R on the cross product:

 {chunks of sorted data to each mapper, semi-sorted data to each mapper, random 
data to each mapper} x
 { few mappers relative to data size, medium number of mappers, very high 
number of mappers } x
 { 10 - 80 histogram bins in steps of 10 }

The image shows the MSE of the UDAF histogram() as a function of R's histogram 
for the same number of bins. A value of 1.0 means that Hive and R histograms 
are on par. Values less than 1.0 mean that Hive histogram is actually *better* 
than R's histogram, in terms of MSE.

At 30 and 60 bins, the MSEs of both R and Hive are very small, and the 
discrepancy seems to be R doing something funky at those points.

In summary, it looks pretty good!

> histogram() UDAF for a numerical column
> ---
>
> Key: HIVE-1397
> URL: https://issues.apache.org/jira/browse/HIVE-1397
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Mayank Lahiri
>Assignee: Mayank Lahiri
> Fix For: 0.6.0
>
> Attachments: Histogram_quality.png.jpg, HIVE-1397.1.patch
>
>
> A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
> short, double, long, etc.) column. The result is returned as a map of (x,y) 
> histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
> The algorithm is currently adapted from "A streaming parallel decision tree 
> algorithm" by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
> proportional to the number of histogram bins specified. It has no 
> approximation guarantees, but seems to work well when there is a lot of data 
> and a large number (e.g. 50-100) of histogram bins specified.
> A typical call might be:
> SELECT histogram(val, 10) FROM some_table;
> where the result would be a histogram with 10 bins, returned as a Hive map 
> object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Hive-Hbase integration problem, ask for help

2010-06-11 Thread John Sichi
You should not be specifying any ROW FORMAT for an HBase table.

>From the log in your earlier post, I couldn't tell what was going wrong; I 
>don' think it contained the full exception stacks.  You might be able to dig 
>around in the actual log files to find more.

JVS

From: Zhou Shuaifeng [zhoushuaif...@huawei.com]
Sent: Thursday, June 10, 2010 7:26 PM
To: hive-dev@hadoop.apache.org
Cc: 'zhaozhifeng 00129982'
Subject: Hive-Hbase integration problem, ask for help

Hi Guys,

I download the hive source from SVN server, build it and try to run the
hive-hbase integration.

I works well on all file-based hive tables, but on the hbase-based tables,
the 'insert' command cann't run successful. The 'select' command can run
well.

error info is below:

hive> INSERT OVERWRITE TABLE hive_zsf SELECT * FROM zsf WHERE id=3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201006081948_0021, Tracking URL =
http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0021
Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job
-Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0021
2010-06-09 16:05:43,898 Stage-0 map = 0%,  reduce = 0%
2010-06-09 16:06:12,131 Stage-0 map = 100%,  reduce = 100%
Ended Job = job_201006081948_0021 with errors

Task with the most failures(4):
-
Task ID:
  task_201006081948_0021_m_00

URL:
  http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021
 &tipid=task_201006081948_0021_m_00
-

FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver




I create a hbase-based table with hive, put some data into the hbase table
through the hbase shell, and can select data from it through hive:

CREATE TABLE hive_zsf1(id int, name string) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hive_zsf1");

hbase(main):001:0> scan 'hive_zsf1'
ROW  COLUMN+CELL

 1   column=cf1:val, timestamp=1276157509028,
value=zsf
 2   column=cf1:val, timestamp=1276157539051,
value=zzf
 3   column=cf1:val, timestamp=1276157548247,
value=zw
 4   column=cf1:val, timestamp=1276157557115,
value=cjl
4 row(s) in 0.0470 seconds
hbase(main):002:0>

hive> select * from hive_zsf1 where id=3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201006081948_0038, Tracking URL =
http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0038
Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job
-Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0038
2010-06-11 10:25:42,049 Stage-1 map = 0%,  reduce = 0%
2010-06-11 10:25:45,090 Stage-1 map = 100%,  reduce = 0%
2010-06-11 10:25:48,133 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201006081948_0038
OK
3   zw
Time taken: 13.526 seconds
hive>





-
This e-mail and its attachments contain confidential information from
HUAWEI, which
is intended only for the person or entity whose address is listed above. Any
use of the
information contained herein in any way (including, but not limited to,
total or partial
disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by
phone or email immediately and delete it!




RE: Is anybody working on the globally "order by" of hive ?

2010-06-11 Thread John Sichi
If someone is interested in adding parallel ORDER BY to Hive (using 
TotalOrderPartitioner), here's a good starting point:

http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad

The goal would be to take that manual two-step sample-then-sort process and 
turn it into an automatic plan within Hive.  I have a better example for the 
sampling query which I haven't published yet.

We would also need to name the final output files in such a way that the total 
order could be iterated via the filenames.

JVS


From: Ning Zhang [nzh...@facebook.com]
Sent: Friday, June 11, 2010 12:40 PM
To: 'hive-u...@hadoop.apache.org'
Cc: 'hive-dev@hadoop.apache.org'
Subject: Re: Is anybody working on the globally "order by" of hive ?

Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict 
mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
--
Sent from my blackberry


From: Edward Capriolo 
To: hive-u...@hadoop.apache.org 
Cc: hive-dev@hadoop.apache.org 
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally "order by" of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:
Hi all,

>From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in 
this. As of now order by sets reduce tasks to 1 :)

Edward


Re: Is anybody working on the globally "order by" of hive ?

2010-06-11 Thread Ning Zhang
Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict 
mode (has to have a limit). I will be able to update the wiki when I come back.

Thanks,
Ning
--
Sent from my blackberry


From: Edward Capriolo 
To: hive-u...@hadoop.apache.org 
Cc: hive-dev@hadoop.apache.org 
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally "order by" of hive ?


On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:
Hi all,

From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


--
Best Regards

Jeff Zhang

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in 
this. As of now order by sets reduce tasks to 1 :)

Edward


Re: Is anybody working on the globally "order by" of hive ?

2010-06-11 Thread Edward Capriolo
On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang  wrote:

> Hi all,
>
> From the wiki of hive, Hive do not have the feature of globally "order
> by", the sort by of hive is for each reducer. Our team think the
> globally "order by" is an important feature for users, so wondering is
> anybody working it ? I am very interested to been involved.
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Jeff,

I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in
this. As of now order by sets reduce tasks to 1 :)

Edward


Build failed in Hudson: Hive-trunk-h0.19 #468

2010-06-11 Thread Apache Hudson Server
See 

--
[...truncated 6865 lines...]
[junit] plan = 

[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: louter_join_ppr.q
[junit] Begin query: mapjoin_subquery.q
[junit] plan = 

[junit] plan = 

[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: mapjoin_subquery.q
[junit] Begin query: mapreduce1.q
[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: mapreduce1.q
[junit] Begin query: mapreduce2.q
[junit] plan = 

[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: mapreduce2.q
[junit] Begin query: mapreduce3.q
[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: mapreduce3.q
[junit] Begin query: mapreduce4.q
[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I a

[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Status: Patch Available  (was: Open)

> Move hive language manual and all wiki based documentation to forest
> 
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1335-1.patch.txt, jdom-1.1.jar
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: hive-1335-1.patch.txt

Patch, docs command beings anakia from xdocs directory.

> Move hive language manual and all wiki based documentation to forest
> 
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1335-1.patch.txt, jdom-1.1.jar
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Fix Version/s: 0.6.0
Affects Version/s: 0.5.0

> Move hive language manual and all wiki based documentation to forest
> 
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1335-1.patch.txt, jdom-1.1.jar
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Hive-trunk-h0.17 #466

2010-06-11 Thread Apache Hudson Server
See 




[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: jdom-1.1.jar

> Move hive language manual and all wiki based documentation to forest
> 
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: jdom-1.1.jar
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #469

2010-06-11 Thread Apache Hudson Server
See 

--
Started by timer
Building remotely on minerva.apache.org (Ubuntu)
Checking out http://svn.apache.org/repos/asf/hadoop/hive/trunk
ERROR: Failed to check out http://svn.apache.org/repos/asf/hadoop/hive/trunk
org.tmatesoft.svn.core.SVNException: svn: Connection reset
svn: OPTIONS request failed on '/repos/asf/hadoop/hive/trunk'
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:103)
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:87)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:616)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:273)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:261)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:516)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:98)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1001)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getLatestRevision(DAVRepository.java:178)
at 
org.tmatesoft.svn.core.wc.SVNBasicClient.getRevisionNumber(SVNBasicClient.java:482)
at 
org.tmatesoft.svn.core.wc.SVNBasicClient.getLocations(SVNBasicClient.java:851)
at 
org.tmatesoft.svn.core.wc.SVNBasicClient.createRepository(SVNBasicClient.java:534)
at 
org.tmatesoft.svn.core.wc.SVNUpdateClient.doCheckout(SVNUpdateClient.java:893)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:740)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:660)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2018)
at hudson.remoting.UserRequest.perform(UserRequest.java:114)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:270)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPParser.readPlainLine(HTTPParser.java:69)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPParser.readLine(HTTPParser.java:51)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPParser.parseStatus(HTTPParser.java:39)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.readHeader(HTTPConnection.java:195)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPRequest.dispatch(HTTPRequest.java:175)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:345)
... 22 more
Recording test results



[jira] Commented: (HIVE-1082) "create table if not exists " should check if the specified schema matches the existing schema, and throw an error if it doesnt.

2010-06-11 Thread Soundararajan Velu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877769#action_12877769
 ] 

Soundararajan Velu commented on HIVE-1082:
--

Thanks Ning, That makes sense, At the moment we have added code to log a WARN, 
but using Create or replace table is the right approach as you suggest, will 
update if we get this done..

> "create table if not exists " should check if the specified  schema matches 
> the existing schema, and throw an error if it doesnt.  
> ---
>
> Key: HIVE-1082
> URL: https://issues.apache.org/jira/browse/HIVE-1082
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Venky Iyer
>
> I think we should make sure that the table can either be created anew with 
> the specified properties, or it already exists with those properties, where 
> 'properties' includes all metadata except timestamps. Anything else is an 
> error. This makes sense if you think of a table as the name  + its schema, 
> instead of the name alone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Is anybody working on the globally "order by" of hive ?

2010-06-11 Thread Jeff Zhang
Hi all,

>From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.


-- 
Best Regards

Jeff Zhang


[jira] Updated: (HIVE-543) provide option to run hive in local mode

2010-06-11 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-543:
---

Attachment: hive-534.patch.2

log jobconf parameters from child jvm in local mode. emulates the fact that we 
can look at job.xml from TT/JT.

> provide option to run hive in local mode
> 
>
> Key: HIVE-543
> URL: https://issues.apache.org/jira/browse/HIVE-543
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Attachments: hive-534.patch.2, hive-543.patch.1
>
>
> this is a little bit more than just mapred.job.tracker=local
> when run in this mode - multiple jobs are an issue since writing to same tmp 
> directories is an issue. the following options:
> hadoop.tmp.dir
> mapred.local.dir
> need to be randomized (perhaps based on queryid). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-88) hadoop doesn't use conf/hive-log4j.properties

2010-06-11 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma resolved HIVE-88.
---

Resolution: Duplicate

> hadoop doesn't use conf/hive-log4j.properties
> -
>
> Key: HIVE-88
> URL: https://issues.apache.org/jira/browse/HIVE-88
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Michi Mutsuzaki
>Assignee: Joydeep Sen Sarma
>Priority: Minor
>
> hadoop-0.20.0-dev-core.jar contains log4j.properties file, and I think that's 
> the one hadoop is picking up. I modified both conf/hive-log4j.properties and 
> hadoopcore/conf/log4j.properties, but hadoop still printed INFO messages to 
> stderr.
> Pasting relevant posts from the mailing list below:
> Michi MutsuzakiFri, Nov 28, 2008 at 7:14 PM
> To: hive-us...@publists.facebook.com
> Hello,
> When I do "ant test" under ql directory, I get many log messages to stderr.
>[junit] 08/11/28 19:04:14 INFO exec.MapOperator: Got partitions: null
>[junit] 08/11/28 19:04:14 INFO exec.ReduceSinkOperator: Initializing Self
>[junit] 08/11/28 19:04:14 INFO exec.ReduceSinkOperator: Using tag = -1
>[junit] 08/11/28 19:04:14 INFO thrift.TBinarySortableProtocol:
> Sort order is ""
>[junit] 08/11/28 19:04:14 INFO thrift.TBinarySortableProtocol:
> Sort order is ""
>
> I tried setting log level to ERROR in conf/hive-log4j.properties, but these 
> info lines still show up. How can I get rid of them?
> Thanks!
> --Michi
> Joydeep Sen Sarma   Fri, Nov 28, 2008 at 10:49 PM
> To: "mi...@cs.stanford.edu" , 
> "hive-us...@publists.facebook.com" 
> When we run the tests - we run in hadoop 'local' mode - and in this mode, we 
> run map-reduce jobs by invoking 'hadoop jar ... ExecDriver' cmd line. this 
> was done because we had some issues submitting map-reduce jobs directly (from 
> same jvm) in local mode that we could not resolve.
> The issue is that when we invoke 'hadoop jar ... ExecDriver' - we don't 
> control log4j via hive-log4j. one thing u can try is changing the hadoop's 
> log4j.properties that hive is picking up (probably 
> hadoopcore/conf/log4j.properties).
> Revisiting this after a long time - I think this can be fixed with some 
> changes to MapRedTask.java (need to add hive-log4j.properties to hadoop 
> classpath here and then reset log4j using this in execdriver). Feel free to 
> file a jira if this is too irritating ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



how to set the debut parameters of hive?

2010-06-11 Thread Zhou Shuaifeng
Hi, I want to debug hive remotely, how to set the config?
E.g. debug hdfs is seeting DEBUG_PARAMETERS in the file 'bin/hadoop', so,
how to set the debug parameters of hive?
Thanks a lot.



-
This e-mail and its attachments contain confidential information from
HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any
use of the 
information contained herein in any way (including, but not limited to,
total or partial 
disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by 
phone or email immediately and delete it!