RE: Hive Custom UDF - "hive.aux.jars.path" not working

2011-08-22 Thread Chinna
Hi,

  

  U need to mention the jar like this,

 

~/Documents/workspace/Hive_0_7_1/build/dist/conf$grep aux hive-site.xml 
hive.aux.jars.path/Users/amsharma/dev/Perforce
/development/depot/dataeng/hive/dist/{URJARNAME}.jar



 

 U r using CLI mode  so after changing the value if u start shell that is
ok...and in another mode also we can start hive that is hiveserver this case
after changing the value u need to restart the hive server

 

Thanks,

Chinna Rao Lalam

  _  

From: Amit Sharma [mailto:amitsharma1...@gmail.com] 
Sent: Tuesday, August 23, 2011 3:35 AM
To: user@hive.apache.org
Subject: Re: Hive Custom UDF - "hive.aux.jars.path" not working

 

Hi Vaibhav,
  Excuse my ignorance as im a little new to Hive. What do you mean by
restart the Hive Server? I am using the Hive Interactive shell for my work.
So i start the shell after modifying the config variable. Which server do i
need to restart?

Amit

On Mon, Aug 22, 2011 at 2:49 PM, Aggarwal, Vaibhav 
wrote:

Did you restart the hive server after modifying the hive-site.xml settings?

I think you need to restart the server to pick up the latest settings in the
config file.

 

Thanks

Vaibhav

 

From: Amit Sharma [mailto:amitsharma1...@gmail.com] 
Sent: Monday, August 22, 2011 2:42 PM
To: user@hive.apache.org
Subject: Hive Custom UDF - "hive.aux.jars.path" not working

 

Hi,
  I build custom UDFS for hive and they seem to work fine when i explicitly
register the jars using the "add jar " command or put in in the
environment variable "HIVE_AUX_JARS_PATH". But if i add it as a
configuration variable in the hive-site.xml file and try to register the
function using "create temporary function  as 'funciton' ", it
cannot find the jar. Any idea whats going on here? 

Here is the snippet from hive-site.xml:

~/Documents/workspace/Hive_0_7_1/build/dist/conf$grep aux hive-site.xml 
hive.aux.jars.path/Users/amsharma/dev/Perforce
/development/depot/dataeng/hive/dist

Amit

 



回复: hive-0.7.1: TestCliDriver FAILED

2011-08-22 Thread 李 冰
>From the result, we can see that the difference between the source file and 
>target file is only the path which should be masked when being compared.

Bing

--- 11年8月22日,周一, 李 冰  写道:

发件人: 李 冰 
主题: hive-0.7.1: TestCliDriver FAILED
收件人: user@hive.apache.org
抄送: d...@hive.apache.org
日期: 2011年8月22日,周一,下午10:43

Hi, all
When I try to run the standard test cases in Hive 0.7.1 against SUN 1.6 JDK, I 
found that TestCliDriver failed.

The version of the JDK I used is:
java version "1.6.0_27-ea"
Java(TM) SE Runtime Environment (build 1.6.0_27-ea-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b03, mixed mode)


My steps:
1. ant clean
2. ant package
3. ant test

Here is a snapshot of the failure:

    [junit] Done query: script_env_var2.q
    [junit] Begin query: script_pipe.q
    [junit] junit.framework.AssertionFailedError: Client execution results 
failed with error code = 1
    [junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
    [junit] at
 junit.framework.Assert.fail(Assert.java:47)
    [junit] at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_script_pipe(TestCliDriver.java:21067)
    [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    [junit] at java.lang.reflect.Method.invoke(Method.java:597)
    [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
    [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
    [junit]
 at junit.framework.TestResult$1.protect(TestResult.java:106)
    [junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
    [junit] at junit.framework.TestResult.run(TestResult.java:109)
    [junit] at junit.framework.TestCase.run(TestCase.java:118)
    [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
    [junit] at junit.framework.TestSuite.run(TestSuite.java:203)
    [junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
    [junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
    [junit] at
 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
    [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I LOCK_QUERYID: -I grantTime -I [.][.][.] [0-9]* more -I USING 'java -cp 
/home/libing/hive-0.7.1/src/build/ql/test/logs/clientpositive/script_pipe.q.out 
/home/libing/hive-0.7.1/src/ql/src/test/results/clientpositive/script_pipe.q.out
    [junit] 143c143,144
    [junit] < POSTHOOK: Output: 
file:/tmp/libing/hive_2011-08-21_23-27-41_670_8767305526316071428/-mr-1
    [junit] ---
    [junit] > POSTHOOK: Output:
 file:/tmp/sdong/hive_2011-02-10_17-04-27_817_7785884157237702561/-mr-1
    [junit] > 238   val_238 238 val_238
    [junit] Exception: Client execution results failed with error code = 1
    [junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
    [junit] Begin query: select_as_omitted.q

Have you met this before?

Thanks


Re: Hive Custom UDF - "hive.aux.jars.path" not working

2011-08-22 Thread Amit Sharma
Hi Vaibhav,
  Excuse my ignorance as im a little new to Hive. What do you mean by
restart the Hive Server? I am using the Hive Interactive shell for my work.
So i start the shell after modifying the config variable. Which server do i
need to restart?

Amit

On Mon, Aug 22, 2011 at 2:49 PM, Aggarwal, Vaibhav wrote:

> Did you restart the hive server after modifying the hive-site.xml settings?
> 
>
> I think you need to restart the server to pick up the latest settings in
> the config file.
>
> ** **
>
> Thanks
>
> Vaibhav
>
> ** **
>
> *From:* Amit Sharma [mailto:amitsharma1...@gmail.com]
> *Sent:* Monday, August 22, 2011 2:42 PM
> *To:* user@hive.apache.org
> *Subject:* Hive Custom UDF - "hive.aux.jars.path" not working
>
> ** **
>
> Hi,
>   I build custom UDFS for hive and they seem to work fine when i explicitly
> register the jars using the "add jar " command or put in in the
> environment variable "HIVE_AUX_JARS_PATH". But if i add it as a
> configuration variable in the hive-site.xml file and try to register the
> function using "create temporary function  as 'funciton' ", it
> cannot find the jar. Any idea whats going on here?
>
> Here is the snippet from hive-site.xml:
>
> ~/Documents/workspace/Hive_0_7_1/build/dist/conf$grep aux hive-site.xml
>
> hive.aux.jars.path/Users/amsharma/dev/Perforce/development/depot/dataeng/hive/dist
>
> Amit
>


RE: Hive Custom UDF - "hive.aux.jars.path" not working

2011-08-22 Thread Aggarwal, Vaibhav
Did you restart the hive server after modifying the hive-site.xml settings?
I think you need to restart the server to pick up the latest settings in the 
config file.

Thanks
Vaibhav

From: Amit Sharma [mailto:amitsharma1...@gmail.com]
Sent: Monday, August 22, 2011 2:42 PM
To: user@hive.apache.org
Subject: Hive Custom UDF - "hive.aux.jars.path" not working

Hi,
  I build custom UDFS for hive and they seem to work fine when i explicitly 
register the jars using the "add jar " command or put in in the 
environment variable "HIVE_AUX_JARS_PATH". But if i add it as a configuration 
variable in the hive-site.xml file and try to register the function using 
"create temporary function  as 'funciton' ", it cannot find the 
jar. Any idea whats going on here?

Here is the snippet from hive-site.xml:

~/Documents/workspace/Hive_0_7_1/build/dist/conf$grep aux hive-site.xml
hive.aux.jars.path/Users/amsharma/dev/Perforce/development/depot/dataeng/hive/dist

Amit


Hive Custom UDF - "hive.aux.jars.path" not working

2011-08-22 Thread Amit Sharma
Hi,
  I build custom UDFS for hive and they seem to work fine when i explicitly
register the jars using the "add jar " command or put in in the
environment variable "HIVE_AUX_JARS_PATH". But if i add it as a
configuration variable in the hive-site.xml file and try to register the
function using "create temporary function  as 'funciton' ", it
cannot find the jar. Any idea whats going on here?

Here is the snippet from hive-site.xml:

~/Documents/workspace/Hive_0_7_1/build/dist/conf$grep aux hive-site.xml
hive.aux.jars.path/Users/amsharma/dev/Perforce/development/depot/dataeng/hive/dist

Amit


One Schema Per Partition? (Multiple schemas per table?)

2011-08-22 Thread Time Less
I found a set of slides from Facebook online about Hive that claims you can
have a schema per partition in the table, this is exciting to us, because we
have a table like so:

id int
name   string
level  int
date   string

And it's broken up into partitions by date. However, on a particular date
last year, the table dramatically changed its schema to:

id   int
levelint
date string
name_id  int

So now if I do "select * from table" in hive, the data is completely garbled
for whichever portion of data doesn't fit the Hive schema. We are
considering re-writing the datafiles so they're the same before/after that
date, but if Hive supports having two entirely different schemas depending
on the partition, that'd be really convenient, since these datafiles are
hundreds of gigabytes in size (and we do sort of like the idea of knowing
how the datafile looked back then...).

This page:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatementsdoesn't
seem to have an appropriate example, so I'm left wondering.

Has anyone done anything like this?

-- 
Tim Ellis
Data Architect, Riot Games


Re: Passing table properties to the InputFormat

2011-08-22 Thread Shantian Purkad
I have been able to get the table properties in InputFormat as below. However I 
am not sure if that is correct way or if there is any better way for that.

Properties tableProperties = 
Utilities.getMapRedWork(job).getPathToPartitionInfo().get(getInputPaths(job)[0].toString()).getTableDesc().getProperties()
 ;





From: Shantian Purkad 
To: "user@hive.apache.org" 
Sent: Saturday, August 20, 2011 5:01 PM
Subject: Passing table properties to the InputFormat


Hi,

I have a custom Input format that reads multiple lines as one row based on 
number of columns in a table.

I want to dynamically pass the table properties (like number of columns in 
table, their data types etc. just like what you get in SerDe) How can I do that?

If that is not possible, and SerDe is an option, how can I use my custom record 
reader in SerDe?

My table definition is 


create table delimited_data_serde
(
col1 int,
col2 string,
col3 int,
col4 string,
col5 string,
col6 string
)

STORED AS INPUTFORMAT 'fwrk.hadoop.input.DelimitedInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
;

The input format needs needs the property 'total.fields.count'='6'
If I set this using set total.fields.count=6 ; It works, however I will have to 
change this property for every table that uses the custom Input format before I 
query that table.
How can I automatically get handle to the table properties in input format?

Regards,
Shantian

Local and remote metastores

2011-08-22 Thread Alex Holmes
Hi everyone,

Does anyone know the differences between local and remove Hive
metastores?  Are there features that are only provided by the remote
datastore (like authorization)?  Is the use of a local metastore
recommended in production?

Many thanks,
Alex


Re: org.apache.hadoop.fs.ChecksumException: Checksum error:

2011-08-22 Thread W S Chung
I try using hadoop fs -copyToLocal. I also get a stack trace, like this:

11/08/22 10:53:57 INFO fs.FSInputChecker: Found checksum error:
b[1024, 
1536]=31325431393a32313a31315a7c3137342e3235332e3234352e3232377c39376261623664642d353062342d343461612d383235642d6537336238646434336563337c36373842303935453945304431374635383833344135464336423341424646357c342e327c313931393638200a323031312d30352d31325431393a32313a31315a7c3137342e3235332e3234352e3232377c39376261623664642d353062342d343461612d383235642d6537336238646434336563337c36373842303935453945304031374635383833344135464336423341424646357c342e322e317c313931393638200a323031312d30352d31325431393a32323a33395a7c3137342e3235332e3234352e3232377c39376261623664642d353062342d343461612d383235642d6537336238646434336563337c36373842303935453945304431374635383833344135464336423341424646357c362e322e317c313837373837200a323031312d30352d31325431393a32323a34335a7c3137342e3235332e3234352e3232377c39376261623664642d353062342d343461612d383235642d6537336238646434336563337c36373842303935453945304431374635383833344135464336423341424646357c362e337c3138373738375f61745f706f736974696f6e5f3835200a323031312d30352d31325431393a32323a34335a7c3137342e
org.apache.hadoop.fs.ChecksumException: Checksum error:
/blk_2722854101062410251:of:/user/hive/warehouse/att_log/collect_time=1314024490064/load.dat
at 64635904
at 
org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1158)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1718)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1770)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:53)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:72)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:320)
at org.apache.hadoop.fs.FsShell.copyToLocal(FsShell.java:248)
at org.apache.hadoop.fs.FsShell.copyToLocal(FsShell.java:199)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1754)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)
11/08/22 10:53:57 WARN hdfs.DFSClient: Found Checksum error for
blk_2722854101062410251_1038 from 192.168.50.192:50010 at 64635904
11/08/22 10:53:57 INFO hdfs.DFSClient: Could not obtain block
blk_2722854101062410251_1038 from any node:  java.io.IOException: No
live nodes contain current block
copyToLocal: Checksum error:
/blk_2722854101062410251:of:/user/hive/warehouse/att_log/collect_time=1314024490064/load.dat
at 64635904


I manage to load two files(by using the Java API copyFromLocal call
and then a 'load data inpath' call to load the data into the table).
hadoop fsck does not show corrupted block until I run the 'select
count(*)' call after loading the second file. 'hadoop fs -copyToLocal'
also only fails after hadoop fsck shows corrupted block. For the first
loaded file, 'hadoop fs -copyToLocal' works fine. It does look like
the problem is with hdfs.

I originally discover this issue on a two-node cluster with a
replication factor of 2. But I am now testing on a pseudo-distributed
install with only one node and a replication factor of 1.

I am using text file. I would like to try to use sequencefile. I
understand the "io.skip.checksum.errors" setting only applies to
sequencefile. But the only way I know to load data into a table with
sequencefile as storage is to first load the text file into a table
with textfile as storage and then use a 'insert into select' to load
the data into the sequencefile table. The 'insert into select' already
fails with the same problem as running a query on the textfile table.
Is there any other way to load a sequencefile table?



On Fri, Aug 19, 2011 at 8:57 PM, Aggarwal, Vaibhav  wrote:
> This is a really curious case.
>
> How many replicas of each block do you have?
>
> Are you able to copy the data directly using HDFS client?
> You could try the hadoop fs -copyToLocal command and see if it can copy the 
> data from hdfs correctly.
>
> That would help you verify that the issue really is at HDFS layer (though it 
> does look like that from the stack trace).
>
> Which file format are you using?
>
> Thanks
> Vaibhav
>
> -Original Message-
> From: W S Chung [mailto:qp.wsch...@gmail.com]
> Sent: Friday, August 19, 2011 3:26 PM
> To: user@hive.apache.org
> Subject: org.apache.hadoop.fs.ChecksumException: Checksum error:
>
> For some reason, my questions sent two days ago again never shows up, even 
> though I can goo

hive-0.7.1: TestCliDriver FAILED

2011-08-22 Thread 李 冰
Hi, all
When I try to run the standard test cases in Hive 0.7.1 against SUN 1.6 JDK, I 
found that TestCliDriver failed.

The version of the JDK I used is:
java version "1.6.0_27-ea"
Java(TM) SE Runtime Environment (build 1.6.0_27-ea-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b03, mixed mode)


My steps:
1. ant clean
2. ant package
3. ant test

Here is a snapshot of the failure:

    [junit] Done query: script_env_var2.q
    [junit] Begin query: script_pipe.q
    [junit] junit.framework.AssertionFailedError: Client execution results 
failed with error code = 1
    [junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
    [junit] at junit.framework.Assert.fail(Assert.java:47)
    [junit] at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_script_pipe(TestCliDriver.java:21067)
    [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    [junit] at java.lang.reflect.Method.invoke(Method.java:597)
    [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
    [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
    [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
    [junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
    [junit] at junit.framework.TestResult.run(TestResult.java:109)
    [junit] at junit.framework.TestCase.run(TestCase.java:118)
    [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
    [junit] at junit.framework.TestSuite.run(TestSuite.java:203)
    [junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
    [junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
    [junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
    [junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I LOCK_QUERYID: -I grantTime -I [.][.][.] [0-9]* more -I USING 'java -cp 
/home/libing/hive-0.7.1/src/build/ql/test/logs/clientpositive/script_pipe.q.out 
/home/libing/hive-0.7.1/src/ql/src/test/results/clientpositive/script_pipe.q.out
    [junit] 143c143,144
    [junit] < POSTHOOK: Output: 
file:/tmp/libing/hive_2011-08-21_23-27-41_670_8767305526316071428/-mr-1
    [junit] ---
    [junit] > POSTHOOK: Output: 
file:/tmp/sdong/hive_2011-02-10_17-04-27_817_7785884157237702561/-mr-1
    [junit] > 238   val_238 238 val_238
    [junit] Exception: Client execution results failed with error code = 1
    [junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
    [junit] Begin query: select_as_omitted.q

Have you met this before?

Thanks