Difficulties in executing lag function

2013-06-20 Thread Omkar Joshi
Hi,

I have a orders table created in Hive.

create table orders
( order_date TIMESTAMP,
product_id INT,
qty INT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

I wish to execute a query similar to the following :

select product_id, order_date,
lag (order_date,1) over (ORDER BY order_date) AS prev_order_date
from orders
where product_id = 2000;

I'm referring to https://github.com/hbutani/SQLWindowing/wiki

I'm not sure I have got the syntax - do we have to switch between the modes 
even within the single query?


hduser@cldx-1151-1044:~/hadoop_ecosystem/apache_hive/hive_installation/hive-0.9.0/bin$
 hive --service windowingCli
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Hive history file=/tmp/hduser/hive_job_log_hduser_201306211707_783850714.txt
hive> wmode;
windowing mode is off
hive> select * from orders;
OK
2007-09-25 00:00:00 100020
2007-09-26 00:00:00 200015
2007-09-27 00:00:00 10008
2007-09-28 00:00:00 200012
2007-09-29 00:00:00 20002
2007-09-30 00:00:00 10004
Time taken: 37.862 seconds
hive> select product_id, order_date;wmode windowing;lag ('order_date,1') over 
(ORDER BY order_date) AS prev_order_date; wmode hive; from orders;
BR.recoverFromMismatchedToken
Exception in thread "main" java.lang.IncompatibleClassChangeError: the number 
of constructors during runtime and compile time for java.lang.Exception do not 
match. Expected 4 but got 5
at 
groovy.lang.MetaClassImpl.selectConstructorAndTransformArguments(MetaClassImpl.java:1400)
at 
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.selectConstructorAndTransformArguments(ScriptBytecodeAdapter.java:234)
at 
com.sap.hadoop.windowing.WindowingException.(WindowingException.groovy:21)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:77)
at 
org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:102)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:54)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:182)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:190)
at 
com.sap.hadoop.windowing.runtime.WindowingShell.parse(WindowingShell.groovy:72)
at 
com.sap.hadoop.windowing.runtime.WindowingShell.execute(WindowingShell.groovy:126)
at com.sap.hadoop.windowing.runtime.WindowingShell$execute.call(Unknown 
Source)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at 
com.sap.hadoop.windowing.cli.WindowingClient3.executeQuery(WindowingClient3.groovy:28)
at 
com.sap.hadoop.windowing.cli.WindowingClient3$executeQuery.call(Unknown Source)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at 
com.sap.hadoop.windowing.WindowingHiveCliDriver.processCmd(WindowingHiveCliDriver.groovy:117)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
at org.apache.hadoop.hive.cli.CliDriver$processLine.call(Unknown Source)
at 
com.sap.hadoop.windowing.WindowingHiveCliDriver.main(WindowingHiveCliDriver.groovy:235)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

hive> wmode windowing;
hive> select product_id, order_date,lag ('order_date',1) AS prev_order_date 
from orders;
BR.recoverFromMismatchedToken
Exception in thread "main" java.lang.IncompatibleClassChangeError: the number 
of constructors during runtime and compile time for java.lang.Exception do not 
match. Expected 4 b

Re: Question regarding nested complex data type

2013-06-20 Thread Stephen Sprague
look at it the other around if you want.  knowing an array of a two element
struct is topologically the same as a map - they  darn well better be the
same. :)



On Thu, Jun 20, 2013 at 7:00 PM, Dean Wampler  wrote:

> It's not as "simple" as it seems, as I discovered yesterday, to my
> surprise. I created a table like this:
>
> CREATE TABLE t (
>   name STRING,
>   stuff   ARRAY>);
>
> I then used an insert statement to see how Hive would store the records,
> so I could populate the real table with another process. Hive used ^A for
> the field separator, ^B for the collection separator, in this case, to
> separate structs in the array, and ^C to separate the elements in each
> struct, e.g.,:
>
> Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3
>
> In other words, the structure you would expect for this table:
>
> CREATE TABLE t (
>   name STRING,
>   stuff   MAP);
>
> We should have covered the permutations of nested structures in our book,
> but we didn't It would be great to document them, for realz some where.
>
> dean
>
> On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague wrote:
>
>> you only get three.  field separator, array elements separator (aka
>> collection delimiter), and map key/value separator (aka map key
>> delimiter).
>>
>> when you  nest deeper then you gotta use the default '^D', '^E' etc for
>> each level.  At least that's been my experience which i've found has worked
>> successfully.
>>
>>
>> On Thu, Jun 20, 2013 at 7:45 AM, neha  wrote:
>>
>>> Thanks a lot for your reply, Stephen.
>>> To answer your question - I was not aware of the fact that we could use
>>> delimiter (in my example, '|') for first level of nesting. I tried now and
>>> it worked fine.
>>>
>>> My next question - Is there any way to provide delimiter in DDL for
>>> second level of nesting?
>>> Thanks again!!
>>>
>>>
>>> On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague wrote:
>>>
 its all there in the documentation under "create table" and it seems
 you got everything right too except one little thing - in your second
 example there for 'sample data loaded' - instead of '^B' change that to
 '|'  and you should be good. That's the delimiter that separates your two
 array elements - ie collections.

 i guess the real question for me is when you say 'since there is no way
 to use given delimiter "|" ' what did you mean by that?



 On Thu, Jun 20, 2013 at 1:42 AM, neha  wrote:

> Hi All,
>
> I have 2 questions about complex data types in nested composition.
>
> 1 >> I did not find a way to provide delimiter information in DDL if
> one or more column has nested array/struct. In this case, default 
> delimiter
> has to be used for complex type column.
> Please let me know if this is a limitation as of now or I am missing
> something.
>
> e.g.:
> *DDL*:
> hive> create table example(col1 int, col2
> array>) row format delimited fields terminated
> by ',';
> OK
> Time taken: 0.226 seconds
>
> *Sample data loaded:*
> 1,1^Cstring1^B2^Cstring2
>
> *O/P:*
> hive> select * from example;
> OK
> 1[{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
> Time taken: 0.288 seconds
>
> 2 >> For the same DDL given above, if we provide clause* collection
> items terminated by '|' *and still use default delimiters (since
> there is no way to use given delimiter '|') then the select query shows
> incorrect data.
> Please let me know if this is something expected.
>
> e.g.
> *DDL*:
> hive> create table example(col1 int, col2
> array>) row format delimited fields terminated
> by ',' collection items terminated by '|';
> OK
> Time taken: 0.175 seconds
>
> *Sample data loaded:*
> 1,1^Cstring1^B2^Cstring2
>
> *O/P:
> *hive> select * from
> example;
>
> OK
> 1[{"st1":1,"st2":"string1\u00022"}]
> Time taken: 0.141 seconds
> **
> Thanks & Regards.
>


>>>
>>
>
>
> --
> Dean Wampler, Ph.D.
> @deanwampler
> http://polyglotprogramming.com
>


Re: Question regarding nested complex data type

2013-06-20 Thread Dean Wampler
It's not as "simple" as it seems, as I discovered yesterday, to my
surprise. I created a table like this:

CREATE TABLE t (
  name STRING,
  stuff   ARRAY>);

I then used an insert statement to see how Hive would store the records, so
I could populate the real table with another process. Hive used ^A for the
field separator, ^B for the collection separator, in this case, to separate
structs in the array, and ^C to separate the elements in each struct, e.g.,:

Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3

In other words, the structure you would expect for this table:

CREATE TABLE t (
  name STRING,
  stuff   MAP);

We should have covered the permutations of nested structures in our book,
but we didn't It would be great to document them, for realz some where.

dean

On Thu, Jun 20, 2013 at 9:56 AM, Stephen Sprague  wrote:

> you only get three.  field separator, array elements separator (aka
> collection delimiter), and map key/value separator (aka map key
> delimiter).
>
> when you  nest deeper then you gotta use the default '^D', '^E' etc for
> each level.  At least that's been my experience which i've found has worked
> successfully.
>
>
> On Thu, Jun 20, 2013 at 7:45 AM, neha  wrote:
>
>> Thanks a lot for your reply, Stephen.
>> To answer your question - I was not aware of the fact that we could use
>> delimiter (in my example, '|') for first level of nesting. I tried now and
>> it worked fine.
>>
>> My next question - Is there any way to provide delimiter in DDL for
>> second level of nesting?
>> Thanks again!!
>>
>>
>> On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague wrote:
>>
>>> its all there in the documentation under "create table" and it seems you
>>> got everything right too except one little thing - in your second example
>>> there for 'sample data loaded' - instead of '^B' change that to '|'  and
>>> you should be good. That's the delimiter that separates your two array
>>> elements - ie collections.
>>>
>>> i guess the real question for me is when you say 'since there is no way
>>> to use given delimiter "|" ' what did you mean by that?
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 1:42 AM, neha  wrote:
>>>
 Hi All,

 I have 2 questions about complex data types in nested composition.

 1 >> I did not find a way to provide delimiter information in DDL if
 one or more column has nested array/struct. In this case, default delimiter
 has to be used for complex type column.
 Please let me know if this is a limitation as of now or I am missing
 something.

 e.g.:
 *DDL*:
 hive> create table example(col1 int, col2
 array>) row format delimited fields terminated
 by ',';
 OK
 Time taken: 0.226 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:*
 hive> select * from example;
 OK
 1[{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
 Time taken: 0.288 seconds

 2 >> For the same DDL given above, if we provide clause* collection
 items terminated by '|' *and still use default delimiters (since there
 is no way to use given delimiter '|') then the select query shows incorrect
 data.
 Please let me know if this is something expected.

 e.g.
 *DDL*:
 hive> create table example(col1 int, col2
 array>) row format delimited fields terminated
 by ',' collection items terminated by '|';
 OK
 Time taken: 0.175 seconds

 *Sample data loaded:*
 1,1^Cstring1^B2^Cstring2

 *O/P:
 *hive> select * from
 example;

 OK
 1[{"st1":1,"st2":"string1\u00022"}]
 Time taken: 0.141 seconds
 **
 Thanks & Regards.

>>>
>>>
>>
>


-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com


Re: INSERT non-static data to array?

2013-06-20 Thread Michael Malak
My understanding is that LATERAL VIEW goes the other direction: takes an array 
and makes it into separate rows.  I use that a lot.  But I also need to go the 
other way sometimes: take a bunch of rows and squeeze them down into an array.  
Please correct me if I'm missing something.
 


 From: Edward Capriolo 
To: "user@hive.apache.org" ; Michael Malak 
 
Sent: Thursday, June 20, 2013 9:15 PM
Subject: Re: INSERT non-static data to array?
  


i think you could select into as sub query and then use lateral view.not 
exactly the same but something similar could be done,.

On Thursday, June 20, 2013, Michael Malak  wrote:
> I've created
> https://issues.apache.org/jira/browse/HIVE-4771
>
> to track this issue.
>
>
> - Original Message -
> From: Michael Malak 
> To: "user@hive.apache.org" 
> Cc:
> Sent: Wednesday, June 19, 2013 2:35 PM
> Subject: Re: INSERT non-static data to array?
>
> The example code for inline_table() there has static data.  It's not possible 
> to use a subquery inside the inline_table() or array() is it?
>
> The SQL1999 way is described here:
>
> http://www.postgresql.org/message-id/20041028232152.ga76...@winnie.fuhr.org
>
>
> CREATE TABLE table_a(a int, b int, c int[]);
>
> INSERT INTO table_a
>   SELECT a, b, ARRAY(SELECT c FROM table_c WHERE table_c.parent = table_b.id)
>   FROM table_b
>
> 
> From: Edward Capriolo 
> To: "user@hive.apache.org" ; Michael Malak 
> 
> Sent: Wednesday, June 19, 2013 2:06 PM
> Subject: Re: INSERT non-static data to array?
>
>
>
> : https://issues.apache.org/jira/browse/HIVE-3238
>
>
> This might fit the bill.
>
>
>
>
> On Wed, Jun 19, 2013 at 3:23 PM, Michael Malak  wrote:
>
> Is the only way to INSERT data into a column of type array<> to load data 
> from a pre-existing file, to use hard-coded values in the INSERT statement, 
> or copy an entire array verbatim from another table?  I.e. I'm assuming that 
> a) SQL1999 array INSERT via subquery is not (yet) implemented in Hive, and b) 
> there is also no other way to load dynamically generated data into an array<> 
> column?  If my assumption in a) is true, does a Jira item need to be created 
> for it?
>>
>

Re: INSERT non-static data to array?

2013-06-20 Thread Edward Capriolo
 i think you could select into as sub query and then use lateral view.not
exactly the same but something similar could be done,.

On Thursday, June 20, 2013, Michael Malak  wrote:
> I've created
> https://issues.apache.org/jira/browse/HIVE-4771
>
> to track this issue.
>
>
> - Original Message -
> From: Michael Malak 
> To: "user@hive.apache.org" 
> Cc:
> Sent: Wednesday, June 19, 2013 2:35 PM
> Subject: Re: INSERT non-static data to array?
>
> The example code for inline_table() there has static data.  It's not
possible to use a subquery inside the inline_table() or array() is it?
>
> The SQL1999 way is described here:
>
>
http://www.postgresql.org/message-id/20041028232152.ga76...@winnie.fuhr.org
>
>
> CREATE TABLE table_a(a int, b int, c int[]);
>
> INSERT INTO table_a
>   SELECT a, b, ARRAY(SELECT c FROM table_c WHERE table_c.parent =
table_b.id)
>   FROM table_b
>
> 
> From: Edward Capriolo 
> To: "user@hive.apache.org" ; Michael Malak <
michaelma...@yahoo.com>
> Sent: Wednesday, June 19, 2013 2:06 PM
> Subject: Re: INSERT non-static data to array?
>
>
>
> : https://issues.apache.org/jira/browse/HIVE-3238
>
>
> This might fit the bill.
>
>
>
>
> On Wed, Jun 19, 2013 at 3:23 PM, Michael Malak 
wrote:
>
> Is the only way to INSERT data into a column of type array<> to load data
from a pre-existing file, to use hard-coded values in the INSERT statement,
or copy an entire array verbatim from another table?  I.e. I'm assuming
that a) SQL1999 array INSERT via subquery is not (yet) implemented in Hive,
and b) there is also no other way to load dynamically generated data into
an array<> column?  If my assumption in a) is true, does a Jira item need
to be created for it?
>>
>


Re: INSERT non-static data to array?

2013-06-20 Thread Michael Malak
I've created
https://issues.apache.org/jira/browse/HIVE-4771

to track this issue.


- Original Message -
From: Michael Malak 
To: "user@hive.apache.org" 
Cc: 
Sent: Wednesday, June 19, 2013 2:35 PM
Subject: Re: INSERT non-static data to array?

The example code for inline_table() there has static data.  It's not possible 
to use a subquery inside the inline_table() or array() is it?

The SQL1999 way is described here:

http://www.postgresql.org/message-id/20041028232152.ga76...@winnie.fuhr.org


CREATE TABLE table_a(a int, b int, c int[]);

INSERT INTO table_a
  SELECT a, b, ARRAY(SELECT c FROM table_c WHERE table_c.parent = table_b.id)
  FROM table_b


From: Edward Capriolo 
To: "user@hive.apache.org" ; Michael Malak 
 
Sent: Wednesday, June 19, 2013 2:06 PM
Subject: Re: INSERT non-static data to array?



: https://issues.apache.org/jira/browse/HIVE-3238


This might fit the bill.




On Wed, Jun 19, 2013 at 3:23 PM, Michael Malak  wrote:

Is the only way to INSERT data into a column of type array<> to load data from 
a pre-existing file, to use hard-coded values in the INSERT statement, or copy 
an entire array verbatim from another table?  I.e. I'm assuming that a) SQL1999 
array INSERT via subquery is not (yet) implemented in Hive, and b) there is 
also no other way to load dynamically generated data into an array<> column?  
If my assumption in a) is true, does a Jira item need to be created for it?
>


Re: Run queries from external files as subqueries

2013-06-20 Thread Jan Dolinár
Quick and dirty way to do such thing would be to use some kind of
preprocessor. To avoid writing one, you could use e.g. the one from GCC,
with just a little help from sed:

gcc -E -x c query.hql -o- | sed '/#/d' > preprocessed.hql
hive -f preprocessed.hql

Where query.hql can contain for example something like

SELECT * FROM (
#include "subquery.hql"
) t
WHERE id = 1;

The includes can be nested and multiplied as much as necessary. As a bonus,
you could also use #define for repeated parts of code and/or #ifdef to
build different queries based on parameters parameters passed to gcc ;-)

Best regards,
Jan Dolinar


On Thu, Jun 20, 2013 at 10:09 PM, Bertrand Dechoux wrote:

> I am afraid that there is no automatic way of doing so. But that would be
> the same answer whether the question is about hive or any relational
> database.
> (I would be glad to have counter examples.)
>
> You might want to look at oozie in order to manage worflow. But the
> creation of the worflow is manual indeed.
> http://oozie.apache.org/
>
> Regards
>
> Bertrand
>
>
>
>
> On Thu, Jun 20, 2013 at 9:59 PM, Sha Liu  wrote:
>
>> Hi,
>>
>> While working on some complex queries with multiple level of subqueries,
>> I'm wonder if it is possible in Hive to refactor these subqueries into
>> different files and instruct the enclosing query to execute these files.
>> This way these subqueries can potentially be reused by other questions or
>> just run by themselves.
>>
>> Thanks,
>> Sha Liu
>>
>
>
>
> --
> Bertrand Dechoux
>


Re: Run queries from external files as subqueries

2013-06-20 Thread Bertrand Dechoux
I am afraid that there is no automatic way of doing so. But that would be
the same answer whether the question is about hive or any relational
database.
(I would be glad to have counter examples.)

You might want to look at oozie in order to manage worflow. But the
creation of the worflow is manual indeed.
http://oozie.apache.org/

Regards

Bertrand




On Thu, Jun 20, 2013 at 9:59 PM, Sha Liu  wrote:

> Hi,
>
> While working on some complex queries with multiple level of subqueries,
> I'm wonder if it is possible in Hive to refactor these subqueries into
> different files and instruct the enclosing query to execute these files.
> This way these subqueries can potentially be reused by other questions or
> just run by themselves.
>
> Thanks,
> Sha Liu
>



-- 
Bertrand Dechoux


Run queries from external files as subqueries

2013-06-20 Thread Sha Liu
Hi,
While working on some complex queries with multiple level of subqueries, I'm 
wonder if it is possible in Hive to refactor these subqueries into different 
files and instruct the enclosing query to execute these files. This way these 
subqueries can potentially be reused by other questions or just run by 
themselves.
Thanks,Sha Liu

Re: "show table" throwing strange error

2013-06-20 Thread Sanjay Subramanian
Can u try from your ubuntu command prompt
$> hive -e "show tables"

From: Mohammad Tariq mailto:donta...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, June 20, 2013 4:28 AM
To: user mailto:user@hive.apache.org>>
Subject: Re: "show table" throwing strange error

Thank you for the response ma'am. It didn't help either.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jun 20, 2013 at 8:43 AM, Sunita Arvind 
mailto:sunitarv...@gmail.com>> wrote:
Your issue seems familiar. Try logging out of hive session and re-login.

Sunita


On Wed, Jun 19, 2013 at 8:53 PM, Mohammad Tariq 
mailto:donta...@gmail.com>> wrote:
Hello list,

 I have a hive(0.9.0) setup on my Ubuntu box running hadoop-1.0.4. 
Everything was going smooth till now. But today when I issued show tables I got 
some strange error on the CLI. Here is the error :

hive> show tables;
FAILED: Parse Error: line 1:0 character '' not supported here
line 1:1 character '' not supported here
line 1:2 character '' not supported here
line 1:3 character '' not supported here
line 1:4 character '' not supported here
line 1:5 character '' not supported here
line 1:6 character '' not supported here
line 1:7 character '' not supported here
line 1:8 character '' not supported here
line 1:9 character '' not supported here
line 1:10 character '' not supported here
line 1:11 character '' not supported here
line 1:12 character '' not supported here
line 1:13 character '' not supported here
line 1:14 character '' not supported here
line 1:15 character '' not supported here
line 1:16 character '' not supported here
line 1:17 character '' not supported here
line 1:18 character '' not supported here
line 1:19 character '' not supported here
line 1:20 character '' not supported here
line 1:21 character '' not supported here
line 1:22 character '' not supported here
line 1:23 character '' not supported here
line 1:24 character '' not supported here
line 1:25 character '' not supported here
line 1:26 character '' not supported here
line 1:27 character '' not supported here
line 1:28 character '' not supported here
line 1:29 character '' not supported here
line 1:30 character '' not supported here
line 1:31 character '' not supported here
line 1:32 character '' not supported here
line 1:33 character '' not supported here
line 1:34 character '' not supported here
line 1:35 character '' not supported here
line 1:36 character '' not supported here
line 1:37 character '' not supported here
line 1:38 character '' not supported here
line 1:39 character '' not supported here
line 1:40 character '' not supported here
line 1:41 character '' not supported here
line 1:42 character '' not supported here
line 1:43 character '' not supported here
line 1:44 character '' not supported here
line 1:45 character '' not supported here
line 1:46 character '' not supported here
line 1:47 character '' not supported here
line 1:48 character '' not supported here
line 1:49 character '' not supported here
line 1:50 character '' not supported here
line 1:51 character '' not supported here
line 1:52 character '' not supported here
line 1:53 character '' not supported here
line 1:54 character '' not supported here
line 1:55 character '' not supported here
line 1:56 character '' not supported here
line 1:57 character '' not supported here
line 1:58 character '' not supported here
line 1:59 character '' not supported here
line 1:60 character '' not supported here
line 1:61 character '' not supported here
line 1:62 character '' not supported here
line 1:63 character '' not supported here
line 1:64 character '' not supported here
line 1:65 character '' not supported here
line 1:66 character '' not supported here
line 1:67 character '' not supported here
line 1:68 character '' not supported here
line 1:69 character '' not supported here
line 1:70 character '' not supported here
line 1:71 character '' not supported here
line 1:72 character '' not supported here
line 1:73 character '' not supported here
line 1:74 character '' not supported here
line 1:75 character '' not supported here
line 1:76 character '' not supported here
line 1:77 character '' not supported here
line 1:78 character '' not supported here
line 1:79 character '' not supported here
.
.
.
.
.
.
line 1:378 character '' not supported here
line 1:379 character '' not supported here
line 1:380 character '' not supported here
line 1:381 character '' not supported here

Strangely other queries like select foo from pokes where bar = 'tariq'; are 
working fine. Tried to search over the net but could not find anything 
useful.Need some help.

Thank you so much for your time.

Warm Regards,
Tariq
cloudfront.blogspot.com



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged infor

Re: Hive External Table issue

2013-06-20 Thread Stephen Sprague
i agree.

conclusion: unless you're some kind of hive guru use a directory location
and get that to work before trying to get clever with file locations -
especially when you see an error message about "not a directory and unable
to create it" :)   Walk before you run good people.


On Thu, Jun 20, 2013 at 11:55 AM, Nitin Pawar wrote:

> Ramki,
>
> I was going through that thread before as Sanjeev said it worked so I was
> doing some experiment as well.
> As you I too had the impression that Hive tables are associated with
> directories and as pointed out I was wrong.
>
> Basically the idea of pointing a table to a file as mentioned on that
> thread is kind of hack
> create table without location
> alter table to point to file
>
> From Mark's answer what he suggest is we can use virtual column
> INPUT__FILE__NAME to select which file we want to use while querying in
> case there are multiple files inside a directory and you just want to use a
> specific one.
>
> The bug I mentioned is for  files, having particular files from a
> directory matching the regex. Not for the regex serde.
>
> Correct my understanding if I got anything wrong
>
>
>
>
> On Fri, Jun 21, 2013 at 12:04 AM, Ramki Palle wrote:
>
>> Nitin,
>>
>> Can you go through the thread with subject "S3/EMR Hive: Load contents
>> of a single file"  on Tue, 26 Mar, 17:11> at
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1
>>
>>  This gives the whole discussion about the topic of table location
>> pointing to a filename vs. directory.
>>
>> Can you give your insight from this discussion and the discussion you
>> mentioned at stackoverflow link?
>>
>> Regards,
>> Ramki.
>>
>>
>>
>> On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar wrote:
>>
>>> Also see this JIRA
>>> https://issues.apache.org/jira/browse/HIVE-951
>>>
>>> I think issue you are facing is due to the JIRA
>>>
>>>
>>> On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar 
>>> wrote:
>>>
 Mark has answered this before

 http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil

 If this link does not answer your question, do let us know


 On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar <
 sanjeev.sa...@gmail.com> wrote:

> Two issues:
>
> 1. I've created external tables in hive based on file location before
> and it work without any issue. It don't have to be a directory.
>
> 2. If there are more than one file in the directory, and you create
> external table based on directory then how the table knows that which file
> it need to look for the data?
>
> I tried to create the table based on directory, it created the table
> but all the rows were NULL.
>
> -Sanjeev
>
>
> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar  > wrote:
>
>> in hive when you create table and use the location to refer hdfs
>> path, that path is supposed to be a directory.
>> If the directory is not existing it will try to create it and if its
>> a file it will throw an error as its not a directory
>>
>> thats the error you are getting that location you referred is a file.
>> Change it to the directory and see if that works for you
>>
>>
>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar <
>> sanjeev.sa...@gmail.com> wrote:
>>
>>> I did mention in my mail the hdfs file exists in that location. See
>>> below
>>>
>>> In HDFS: file exists
>>>
>>>
>>>
>>> hadoop fs -ls
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> Found 1 items
>>>
>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> so the directory and file both exists.
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar <
>>> nitinpawar...@gmail.com> wrote:
>>>
 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it
 will work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
 sanjeev.sa...@gmail.com> wrote:

> Hello Everyone, I'm running into the following Hive external table
> issue.
>
>
>
> hive> CREATE EXTERNAL TABLE access(
>
>  >   host STRING,
>
>  >   identity STRING,
>
>  >   user STRING,
>
>  >   time STRING,
>
>  >   request STRING,
>
>  >   status ST

Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
Ramki,

I was going through that thread before as Sanjeev said it worked so I was
doing some experiment as well.
As you I too had the impression that Hive tables are associated with
directories and as pointed out I was wrong.

Basically the idea of pointing a table to a file as mentioned on that
thread is kind of hack
create table without location
alter table to point to file

>From Mark's answer what he suggest is we can use virtual column
INPUT__FILE__NAME to select which file we want to use while querying in
case there are multiple files inside a directory and you just want to use a
specific one.

The bug I mentioned is for  files, having particular files from a directory
matching the regex. Not for the regex serde.

Correct my understanding if I got anything wrong




On Fri, Jun 21, 2013 at 12:04 AM, Ramki Palle  wrote:

> Nitin,
>
> Can you go through the thread with subject "S3/EMR Hive: Load contents of
> a single file"  on Tue, 26 Mar, 17:11> at
>
>
> http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1
>
>  This gives the whole discussion about the topic of table location
> pointing to a filename vs. directory.
>
> Can you give your insight from this discussion and the discussion you
> mentioned at stackoverflow link?
>
> Regards,
> Ramki.
>
>
>
> On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar wrote:
>
>> Also see this JIRA
>> https://issues.apache.org/jira/browse/HIVE-951
>>
>> I think issue you are facing is due to the JIRA
>>
>>
>> On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar wrote:
>>
>>> Mark has answered this before
>>>
>>> http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil
>>>
>>> If this link does not answer your question, do let us know
>>>
>>>
>>> On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar >> > wrote:
>>>
 Two issues:

 1. I've created external tables in hive based on file location before
 and it work without any issue. It don't have to be a directory.

 2. If there are more than one file in the directory, and you create
 external table based on directory then how the table knows that which file
 it need to look for the data?

 I tried to create the table based on directory, it created the table
 but all the rows were NULL.

 -Sanjeev


 On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar 
 wrote:

> in hive when you create table and use the location to refer hdfs path,
> that path is supposed to be a directory.
> If the directory is not existing it will try to create it and if its a
> file it will throw an error as its not a directory
>
> thats the error you are getting that location you referred is a file.
> Change it to the directory and see if that works for you
>
>
> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar <
> sanjeev.sa...@gmail.com> wrote:
>
>> I did mention in my mail the hdfs file exists in that location. See
>> below
>>
>> In HDFS: file exists
>>
>>
>>
>> hadoop fs -ls
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> Found 1 items
>>
>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> so the directory and file both exists.
>>
>>
>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar <
>> nitinpawar...@gmail.com> wrote:
>>
>>> MetaException(message:hdfs://
>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> is not a directory or unable to create one)
>>>
>>>
>>> it clearly says its not a directory. Point to the dictory and it
>>> will work
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
>>> sanjeev.sa...@gmail.com> wrote:
>>>
 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive> CREATE EXTERNAL TABLE access(

  >   host STRING,

  >   identity STRING,

  >   user STRING,

  >   time STRING,

  >   request STRING,

  >   status STRING,

  >   size STRING,

  >   referer STRING,

  >   agent STRING)

  >   ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

  >   WITH SERDEPROPERTIES (

  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*)
 (-|\\[[^\\]]*\\])

 ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\")
 ([^ \"]*|\"[^\"]*\"))?",
>>

Re: Hive External Table issue

2013-06-20 Thread Ramki Palle
Nitin,

Can you go through the thread with subject "S3/EMR Hive: Load contents of a
single file"  on Tue, 26 Mar, 17:11> at


http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/thread?1

 This gives the whole discussion about the topic of table location pointing
to a filename vs. directory.

Can you give your insight from this discussion and the discussion you
mentioned at stackoverflow link?

Regards,
Ramki.



On Thu, Jun 20, 2013 at 11:14 AM, Nitin Pawar wrote:

> Also see this JIRA
> https://issues.apache.org/jira/browse/HIVE-951
>
> I think issue you are facing is due to the JIRA
>
>
> On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar wrote:
>
>> Mark has answered this before
>>
>> http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil
>>
>> If this link does not answer your question, do let us know
>>
>>
>> On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar 
>> wrote:
>>
>>> Two issues:
>>>
>>> 1. I've created external tables in hive based on file location before
>>> and it work without any issue. It don't have to be a directory.
>>>
>>> 2. If there are more than one file in the directory, and you create
>>> external table based on directory then how the table knows that which file
>>> it need to look for the data?
>>>
>>> I tried to create the table based on directory, it created the table but
>>> all the rows were NULL.
>>>
>>> -Sanjeev
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar 
>>> wrote:
>>>
 in hive when you create table and use the location to refer hdfs path,
 that path is supposed to be a directory.
 If the directory is not existing it will try to create it and if its a
 file it will throw an error as its not a directory

 thats the error you are getting that location you referred is a file.
 Change it to the directory and see if that works for you


 On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar <
 sanjeev.sa...@gmail.com> wrote:

> I did mention in my mail the hdfs file exists in that location. See
> below
>
> In HDFS: file exists
>
>
>
> hadoop fs -ls
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> Found 1 items
>
> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> so the directory and file both exists.
>
>
> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar  > wrote:
>
>> MetaException(message:hdfs://
>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> is not a directory or unable to create one)
>>
>>
>> it clearly says its not a directory. Point to the dictory and it will
>> work
>>
>>
>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
>> sanjeev.sa...@gmail.com> wrote:
>>
>>> Hello Everyone, I'm running into the following Hive external table
>>> issue.
>>>
>>>
>>>
>>> hive> CREATE EXTERNAL TABLE access(
>>>
>>>  >   host STRING,
>>>
>>>  >   identity STRING,
>>>
>>>  >   user STRING,
>>>
>>>  >   time STRING,
>>>
>>>  >   request STRING,
>>>
>>>  >   status STRING,
>>>
>>>  >   size STRING,
>>>
>>>  >   referer STRING,
>>>
>>>  >   agent STRING)
>>>
>>>  >   ROW FORMAT SERDE
>>>
>>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>>
>>>  >   WITH SERDEPROPERTIES (
>>>
>>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*)
>>> (-|\\[[^\\]]*\\])
>>>
>>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\")
>>> ([^ \"]*|\"[^\"]*\"))?",
>>>
>>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>>
>>> %7$s %8$s %9$s"
>>>
>>>  >   )
>>>
>>>  >   STORED AS TEXTFILE
>>>
>>>  >   LOCATION
>>>
>>> '/user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>>>
>>> FAILED: Error in metadata:
>>>
>>> MetaException(message:hdfs://
>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> is not a directory or unable to create one)
>>>
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>
>>>
>>>
>>>
>>>
>>> In HDFS: file exists
>>>
>>>
>>>
>>> hadoop fs -ls
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> Found 1 items
>>>
>>> -rw-r--r--   3 hdfs supergroup

unsubscribe

2013-06-20 Thread Neerja Bhatnagar



Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
Also see this JIRA
https://issues.apache.org/jira/browse/HIVE-951

I think issue you are facing is due to the JIRA


On Thu, Jun 20, 2013 at 11:41 PM, Nitin Pawar wrote:

> Mark has answered this before
>
> http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil
>
> If this link does not answer your question, do let us know
>
>
> On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar 
> wrote:
>
>> Two issues:
>>
>> 1. I've created external tables in hive based on file location before and
>> it work without any issue. It don't have to be a directory.
>>
>> 2. If there are more than one file in the directory, and you create
>> external table based on directory then how the table knows that which file
>> it need to look for the data?
>>
>> I tried to create the table based on directory, it created the table but
>> all the rows were NULL.
>>
>> -Sanjeev
>>
>>
>> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar wrote:
>>
>>> in hive when you create table and use the location to refer hdfs path,
>>> that path is supposed to be a directory.
>>> If the directory is not existing it will try to create it and if its a
>>> file it will throw an error as its not a directory
>>>
>>> thats the error you are getting that location you referred is a file.
>>> Change it to the directory and see if that works for you
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar >> > wrote:
>>>
 I did mention in my mail the hdfs file exists in that location. See
 below

 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 so the directory and file both exists.


 On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
 wrote:

> MetaException(message:hdfs://
> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> is not a directory or unable to create one)
>
>
> it clearly says its not a directory. Point to the dictory and it will
> work
>
>
> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
> sanjeev.sa...@gmail.com> wrote:
>
>> Hello Everyone, I'm running into the following Hive external table
>> issue.
>>
>>
>>
>> hive> CREATE EXTERNAL TABLE access(
>>
>>  >   host STRING,
>>
>>  >   identity STRING,
>>
>>  >   user STRING,
>>
>>  >   time STRING,
>>
>>  >   request STRING,
>>
>>  >   status STRING,
>>
>>  >   size STRING,
>>
>>  >   referer STRING,
>>
>>  >   agent STRING)
>>
>>  >   ROW FORMAT SERDE
>>
>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>
>>  >   WITH SERDEPROPERTIES (
>>
>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*)
>> (-|\\[[^\\]]*\\])
>>
>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\")
>> ([^ \"]*|\"[^\"]*\"))?",
>>
>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>
>> %7$s %8$s %9$s"
>>
>>  >   )
>>
>>  >   STORED AS TEXTFILE
>>
>>  >   LOCATION
>>
>> '/user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>>
>> FAILED: Error in metadata:
>>
>> MetaException(message:hdfs://
>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> is not a directory or unable to create one)
>>
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask
>>
>>
>>
>>
>>
>> In HDFS: file exists
>>
>>
>>
>> hadoop fs -ls
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> Found 1 items
>>
>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>>
>>
>> I've download the serde2 jar file too and install it in
>> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
>> services after that.
>>
>>
>>
>> I even added the jar file manually in hive and run the above sql but
>> still failing.
>>
>> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>
>>  > ;
>>
>> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
>> resource: /usr/lib/hive/lib/hive-json-serde-0.2.

Re: Hive External Table issue

2013-06-20 Thread Ramki Palle
1. I was under the impression that you cannot refer the table location to a
file. But, it looks like it works. Please see the discussion in the thread
 http://mail-archives.apache.org/mod_mbox/hive-user/201303.mbox/%
3c556325346ca26341b6f0530e07f90d96017084360...@gbgh-exch-cms.sig.ads%3e

2. It there are more than one file in the directory, your query gets the
data from all the files in that directory.

In your case, the regex may not be parsing the data properly.

Regards,
Ramki.


On Thu, Jun 20, 2013 at 11:03 AM, sanjeev sagar wrote:

> Two issues:
>
> 1. I've created external tables in hive based on file location before and
> it work without any issue. It don't have to be a directory.
>
> 2. If there are more than one file in the directory, and you create
> external table based on directory then how the table knows that which file
> it need to look for the data?
>
> I tried to create the table based on directory, it created the table but
> all the rows were NULL.
>
> -Sanjeev
>
>
> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar wrote:
>
>> in hive when you create table and use the location to refer hdfs path,
>> that path is supposed to be a directory.
>> If the directory is not existing it will try to create it and if its a
>> file it will throw an error as its not a directory
>>
>> thats the error you are getting that location you referred is a file.
>> Change it to the directory and see if that works for you
>>
>>
>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
>> wrote:
>>
>>> I did mention in my mail the hdfs file exists in that location. See below
>>>
>>> In HDFS: file exists
>>>
>>>
>>>
>>> hadoop fs -ls
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> Found 1 items
>>>
>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> so the directory and file both exists.
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
>>> wrote:
>>>
 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
 sanjeev.sa...@gmail.com> wrote:

> Hello Everyone, I'm running into the following Hive external table
> issue.
>
>
>
> hive> CREATE EXTERNAL TABLE access(
>
>  >   host STRING,
>
>  >   identity STRING,
>
>  >   user STRING,
>
>  >   time STRING,
>
>  >   request STRING,
>
>  >   status STRING,
>
>  >   size STRING,
>
>  >   referer STRING,
>
>  >   agent STRING)
>
>  >   ROW FORMAT SERDE
>
> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>
>  >   WITH SERDEPROPERTIES (
>
>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>
> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
> \"]*|\"[^\"]*\"))?",
>
>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>
> %7$s %8$s %9$s"
>
>  >   )
>
>  >   STORED AS TEXTFILE
>
>  >   LOCATION
>
> '/user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>
> FAILED: Error in metadata:
>
> MetaException(message:hdfs://
> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> is not a directory or unable to create one)
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
>
> In HDFS: file exists
>
>
>
> hadoop fs -ls
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> Found 1 items
>
> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
>
>
> I've download the serde2 jar file too and install it in
> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
> services after that.
>
>
>
> I even added the jar file manually in hive and run the above sql but
> still failing.
>
> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>
>  > ;
>
> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>
>
>
> Any help would be highly appreciable.
>
>
>
> -

Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
Mark has answered this before
http://stackoverflow.com/questions/11269203/when-creating-an-external-table-in-hive-can-i-point-the-location-to-specific-fil

If this link does not answer your question, do let us know


On Thu, Jun 20, 2013 at 11:33 PM, sanjeev sagar wrote:

> Two issues:
>
> 1. I've created external tables in hive based on file location before and
> it work without any issue. It don't have to be a directory.
>
> 2. If there are more than one file in the directory, and you create
> external table based on directory then how the table knows that which file
> it need to look for the data?
>
> I tried to create the table based on directory, it created the table but
> all the rows were NULL.
>
> -Sanjeev
>
>
> On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar wrote:
>
>> in hive when you create table and use the location to refer hdfs path,
>> that path is supposed to be a directory.
>> If the directory is not existing it will try to create it and if its a
>> file it will throw an error as its not a directory
>>
>> thats the error you are getting that location you referred is a file.
>> Change it to the directory and see if that works for you
>>
>>
>> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
>> wrote:
>>
>>> I did mention in my mail the hdfs file exists in that location. See below
>>>
>>> In HDFS: file exists
>>>
>>>
>>>
>>> hadoop fs -ls
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> Found 1 items
>>>
>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> so the directory and file both exists.
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar 
>>> wrote:
>>>
 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)


 it clearly says its not a directory. Point to the dictory and it will
 work


 On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar <
 sanjeev.sa...@gmail.com> wrote:

> Hello Everyone, I'm running into the following Hive external table
> issue.
>
>
>
> hive> CREATE EXTERNAL TABLE access(
>
>  >   host STRING,
>
>  >   identity STRING,
>
>  >   user STRING,
>
>  >   time STRING,
>
>  >   request STRING,
>
>  >   status STRING,
>
>  >   size STRING,
>
>  >   referer STRING,
>
>  >   agent STRING)
>
>  >   ROW FORMAT SERDE
>
> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>
>  >   WITH SERDEPROPERTIES (
>
>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>
> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
> \"]*|\"[^\"]*\"))?",
>
>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>
> %7$s %8$s %9$s"
>
>  >   )
>
>  >   STORED AS TEXTFILE
>
>  >   LOCATION
>
> '/user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>
> FAILED: Error in metadata:
>
> MetaException(message:hdfs://
> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> is not a directory or unable to create one)
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
>
> In HDFS: file exists
>
>
>
> hadoop fs -ls
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> Found 1 items
>
> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
>
>
> I've download the serde2 jar file too and install it in
> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
> services after that.
>
>
>
> I even added the jar file manually in hive and run the above sql but
> still failing.
>
> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>
>  > ;
>
> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>
>
>
> Any help would be highly appreciable.
>
>
>
> -Sanjeev
>
>
>
>
>
>
>
>
>
> --
> Sanjeev Sagar
>
> *"**Separate yourself from everything that separates you from others
> !" - Nirankari Baba Hardev Singh ji *
>
> **
>



 --
 Nitin 

Re: Hive External Table issue

2013-06-20 Thread sanjeev sagar
Two issues:

1. I've created external tables in hive based on file location before and
it work without any issue. It don't have to be a directory.

2. If there are more than one file in the directory, and you create
external table based on directory then how the table knows that which file
it need to look for the data?

I tried to create the table based on directory, it created the table but
all the rows were NULL.

-Sanjeev


On Thu, Jun 20, 2013 at 10:30 AM, Nitin Pawar wrote:

> in hive when you create table and use the location to refer hdfs path,
> that path is supposed to be a directory.
> If the directory is not existing it will try to create it and if its a
> file it will throw an error as its not a directory
>
> thats the error you are getting that location you referred is a file.
> Change it to the directory and see if that works for you
>
>
> On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar 
> wrote:
>
>> I did mention in my mail the hdfs file exists in that location. See below
>>
>> In HDFS: file exists
>>
>>
>>
>> hadoop fs -ls
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> Found 1 items
>>
>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> so the directory and file both exists.
>>
>>
>> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar wrote:
>>
>>> MetaException(message:hdfs://
>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> is not a directory or unable to create one)
>>>
>>>
>>> it clearly says its not a directory. Point to the dictory and it will
>>> work
>>>
>>>
>>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar >> > wrote:
>>>
 Hello Everyone, I'm running into the following Hive external table
 issue.



 hive> CREATE EXTERNAL TABLE access(

  >   host STRING,

  >   identity STRING,

  >   user STRING,

  >   time STRING,

  >   request STRING,

  >   status STRING,

  >   size STRING,

  >   referer STRING,

  >   agent STRING)

  >   ROW FORMAT SERDE

 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

  >   WITH SERDEPROPERTIES (

  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

 ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
 \"]*|\"[^\"]*\"))?",

  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s

 %7$s %8$s %9$s"

  >   )

  >   STORED AS TEXTFILE

  >   LOCATION

 '/user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

 FAILED: Error in metadata:

 MetaException(message:hdfs://
 h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 is not a directory or unable to create one)

 FAILED: Execution Error, return code 1 from
 org.apache.hadoop.hive.ql.exec.DDLTask





 In HDFS: file exists



 hadoop fs -ls

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

 Found 1 items

 -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

 /user/flume/events/request_logs/
 ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



 I've download the serde2 jar file too and install it in
 /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
 services after that.



 I even added the jar file manually in hive and run the above sql but
 still failing.

 ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

  > ;

 Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
 resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



 Any help would be highly appreciable.



 -Sanjeev









 --
 Sanjeev Sagar

 *"**Separate yourself from everything that separates you from others !"- 
 Nirankari
 Baba Hardev Singh ji *

 **

>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>>
>> --
>> Sanjeev Sagar
>>
>> *"**Separate yourself from everything that separates you from others !"- 
>> Nirankari
>> Baba Hardev Singh ji *
>>
>> **
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Sanjeev Sagar

*"**Separate yourself from everything that separates you from others
!" - Nirankari
Baba Hardev Singh ji *

**


Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
in hive when you create table and use the location to refer hdfs path, that
path is supposed to be a directory.
If the directory is not existing it will try to create it and if its a file
it will throw an error as its not a directory

thats the error you are getting that location you referred is a file.
Change it to the directory and see if that works for you


On Thu, Jun 20, 2013 at 10:57 PM, sanjeev sagar wrote:

> I did mention in my mail the hdfs file exists in that location. See below
>
> In HDFS: file exists
>
>
>
> hadoop fs -ls
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> Found 1 items
>
> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> so the directory and file both exists.
>
>
> On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar wrote:
>
>> MetaException(message:hdfs://
>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> is not a directory or unable to create one)
>>
>>
>> it clearly says its not a directory. Point to the dictory and it will
>> work
>>
>>
>> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
>> wrote:
>>
>>> Hello Everyone, I'm running into the following Hive external table issue.
>>>
>>>
>>>
>>> hive> CREATE EXTERNAL TABLE access(
>>>
>>>  >   host STRING,
>>>
>>>  >   identity STRING,
>>>
>>>  >   user STRING,
>>>
>>>  >   time STRING,
>>>
>>>  >   request STRING,
>>>
>>>  >   status STRING,
>>>
>>>  >   size STRING,
>>>
>>>  >   referer STRING,
>>>
>>>  >   agent STRING)
>>>
>>>  >   ROW FORMAT SERDE
>>>
>>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>>
>>>  >   WITH SERDEPROPERTIES (
>>>
>>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>>>
>>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
>>> \"]*|\"[^\"]*\"))?",
>>>
>>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>>
>>> %7$s %8$s %9$s"
>>>
>>>  >   )
>>>
>>>  >   STORED AS TEXTFILE
>>>
>>>  >   LOCATION
>>>
>>> '/user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>>>
>>> FAILED: Error in metadata:
>>>
>>> MetaException(message:hdfs://
>>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> is not a directory or unable to create one)
>>>
>>> FAILED: Execution Error, return code 1 from
>>> org.apache.hadoop.hive.ql.exec.DDLTask
>>>
>>>
>>>
>>>
>>>
>>> In HDFS: file exists
>>>
>>>
>>>
>>> hadoop fs -ls
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>> Found 1 items
>>>
>>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>>
>>> /user/flume/events/request_logs/
>>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>>
>>>
>>>
>>> I've download the serde2 jar file too and install it in
>>> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
>>> services after that.
>>>
>>>
>>>
>>> I even added the jar file manually in hive and run the above sql but
>>> still failing.
>>>
>>> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>>
>>>  > ;
>>>
>>> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
>>> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>>
>>>
>>>
>>> Any help would be highly appreciable.
>>>
>>>
>>>
>>> -Sanjeev
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sanjeev Sagar
>>>
>>> *"**Separate yourself from everything that separates you from others !"- 
>>> Nirankari
>>> Baba Hardev Singh ji *
>>>
>>> **
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Sanjeev Sagar
>
> *"**Separate yourself from everything that separates you from others !" - 
> Nirankari
> Baba Hardev Singh ji *
>
> **
>



-- 
Nitin Pawar


Re: Hive External Table issue

2013-06-20 Thread sanjeev sagar
I did mention in my mail the hdfs file exists in that location. See below

In HDFS: file exists



hadoop fs -ls

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

Found 1 items

-rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

so the directory and file both exists.


On Thu, Jun 20, 2013 at 10:24 AM, Nitin Pawar wrote:

> MetaException(message:hdfs://
> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> is not a directory or unable to create one)
>
>
> it clearly says its not a directory. Point to the dictory and it will work
>
>
> On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar 
> wrote:
>
>> Hello Everyone, I'm running into the following Hive external table issue.
>>
>>
>>
>> hive> CREATE EXTERNAL TABLE access(
>>
>>  >   host STRING,
>>
>>  >   identity STRING,
>>
>>  >   user STRING,
>>
>>  >   time STRING,
>>
>>  >   request STRING,
>>
>>  >   status STRING,
>>
>>  >   size STRING,
>>
>>  >   referer STRING,
>>
>>  >   agent STRING)
>>
>>  >   ROW FORMAT SERDE
>>
>> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>>
>>  >   WITH SERDEPROPERTIES (
>>
>>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>>
>> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
>> \"]*|\"[^\"]*\"))?",
>>
>>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>>
>> %7$s %8$s %9$s"
>>
>>  >   )
>>
>>  >   STORED AS TEXTFILE
>>
>>  >   LOCATION
>>
>> '/user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>>
>> FAILED: Error in metadata:
>>
>> MetaException(message:hdfs://
>> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> is not a directory or unable to create one)
>>
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask
>>
>>
>>
>>
>>
>> In HDFS: file exists
>>
>>
>>
>> hadoop fs -ls
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>> Found 1 items
>>
>> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>>
>> /user/flume/events/request_logs/
>> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>>
>>
>>
>> I've download the serde2 jar file too and install it in
>> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
>> services after that.
>>
>>
>>
>> I even added the jar file manually in hive and run the above sql but
>> still failing.
>>
>> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>
>>  > ;
>>
>> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
>> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>>
>>
>>
>> Any help would be highly appreciable.
>>
>>
>>
>> -Sanjeev
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sanjeev Sagar
>>
>> *"**Separate yourself from everything that separates you from others !"- 
>> Nirankari
>> Baba Hardev Singh ji *
>>
>> **
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Sanjeev Sagar

*"**Separate yourself from everything that separates you from others
!" - Nirankari
Baba Hardev Singh ji *

**


Re: Hive External Table issue

2013-06-20 Thread Nitin Pawar
MetaException(message:hdfs://
h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

is not a directory or unable to create one)


it clearly says its not a directory. Point to the dictory and it will work


On Thu, Jun 20, 2013 at 10:52 PM, sanjeev sagar wrote:

> Hello Everyone, I'm running into the following Hive external table issue.
>
>
>
> hive> CREATE EXTERNAL TABLE access(
>
>  >   host STRING,
>
>  >   identity STRING,
>
>  >   user STRING,
>
>  >   time STRING,
>
>  >   request STRING,
>
>  >   status STRING,
>
>  >   size STRING,
>
>  >   referer STRING,
>
>  >   agent STRING)
>
>  >   ROW FORMAT SERDE
>
> 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>
>  >   WITH SERDEPROPERTIES (
>
>  >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])
>
> ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
> \"]*|\"[^\"]*\"))?",
>
>  >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s
>
> %7$s %8$s %9$s"
>
>  >   )
>
>  >   STORED AS TEXTFILE
>
>  >   LOCATION
>
> '/user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';
>
> FAILED: Error in metadata:
>
> MetaException(message:hdfs://
> h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> is not a directory or unable to create one)
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
>
> In HDFS: file exists
>
>
>
> hadoop fs -ls
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
> Found 1 items
>
> -rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14
>
> /user/flume/events/request_logs/
> ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033
>
>
>
> I've download the serde2 jar file too and install it in
> /usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
> services after that.
>
>
>
> I even added the jar file manually in hive and run the above sql but still
> failing.
>
> ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar
>
>  > ;
>
> Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
> resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar
>
>
>
> Any help would be highly appreciable.
>
>
>
> -Sanjeev
>
>
>
>
>
>
>
>
>
> --
> Sanjeev Sagar
>
> *"**Separate yourself from everything that separates you from others !" - 
> Nirankari
> Baba Hardev Singh ji *
>
> **
>



-- 
Nitin Pawar


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
Good eyes Ramki!  thanks this "directory" in place of filename appears to
be working.  The script is getting loaded now using the "Attempt two" i.e.
 the hivetry/classification_wf.py as the script path.

thanks again.

stephenb


2013/6/20 Ramki Palle 

> In the *Attempt two, *are you not supposed to use "hivetry" as the
> directory?
>
> May be you should try giving the full path "
> /opt/am/ver/1.0/hive/hivetry/classifier_wf.py" and see if it works.
>
> Regards,
> Ramki.
>
>
> On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch  wrote:
>
>>
>> Stephen:  would you be willing to share an example of specifying a
>> "directory" as the  add "file" target?I have not seen this working
>>
>> I have attempted to use it as follows:
>>
>> *We will access a script within the "hivetry" directory located here:*
>> hive> ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
>> -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
>> /opt/am/ver/1.0/hive/hivetry/classifier_wf.py
>>
>> *Add the directory  to hive:*
>> hive> add file /opt/am/ver/1.0/hive/hivetry;
>> Added resource: /opt/am/ver/1.0/hive/hivetry
>>
>> *Attempt to run transform query using that script:*
>> *
>> *
>> *Attempt one: use the script name unqualified:*
>>
>> hive>from (select transform (aappname,qappname) using 'classifier_wf.py' 
>> as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table 
>> c select o.aappname2, o.qappname2;
>>
>>
>> (Failed:   Caused by: java.io.IOException: Cannot run program 
>> "classifier_wf.py": java.io.IOException: error=2, No such file or directory)
>>
>>
>> *Attempt two: use the script name with the directory name prefix: *
>>
>> hive>from (select transform (aappname,qappname) using 
>> 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o 
>> insert overwrite table c select o.aappname2, o.qappname2;
>>
>>
>> (Failed:   Caused by: java.io.IOException: Cannot run program 
>> "hive/classifier_wf.py": java.io.IOException: error=2, No such file or 
>> directory)
>>
>>
>>
>>
>>
>> 2013/6/20 Stephen Sprague 
>>
>>> yeah.  the archive isn't unpacked on the remote side. I think add
>>> archive is mostly used for finding java packages since CLASSPATH will
>>> reference the archive (and as such there is no need to expand it.)
>>>
>>>
>>> On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch wrote:
>>>
 thx for the tip on "add " where  is directory. I will try
 that.


 2013/6/20 Stephen Sprague 

> i personally only know of adding a .jar file via add archive but my
> experience there is very limited.  i believe if you 'add file' and the 
> file
> is a directory it'll recursively take everything underneath but i know of
> nothing that inflates or un tars things on the remote end automatically.
>
> i would 'add file' your python script and then within that untar your
> tarball to get at your model data. its just the matter of figuring out the
> path to that tarball that's kinda up in the air when its added as 'add
> file'.  Yeah. "local downlooads directory".  What's the literal path is
> what i'd like to know. :)
>
>
> On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch wrote:
>
>>
>> @Stephen:  given the  'relative' path for hive is from a local
>> downloads directory on each local tasktracker in the cluster,  it was my
>> thought that if the archive were actually being expanded then
>> somedir/somefileinthearchive  should work.  I will go ahead and test this
>> assumption.
>>
>> In the meantime, is there any facility available in hive for making
>> archived files available to hive jobs?  archive or hadoop archive ("har")
>> etc?
>>
>>
>> 2013/6/20 Stephen Sprague 
>>
>>> what would be interesting would be to run a little experiment and
>>> find out what the default PATH is on your data nodes.  How much of a 
>>> pain
>>> would it be to run a little python script to print to stderr the value 
>>> of
>>> the environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>>>
>>> that's of course going through normal channels of "add file".
>>>
>>> the thing is given you're using a relative path "hive/parse_qx.py"
>>> you need to know what the "current directory" is when the process runs 
>>> on
>>> the data nodes.
>>>
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch 
>>> wrote:
>>>

 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation 
 steps .

 It seems the "add archive"  does not make the entries unarchived
 and thus available directly on the default file path - and that is 
 what we
 are looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;

Hive External Table issue

2013-06-20 Thread sanjeev sagar
Hello Everyone, I'm running into the following Hive external table issue.



hive> CREATE EXTERNAL TABLE access(

 >   host STRING,

 >   identity STRING,

 >   user STRING,

 >   time STRING,

 >   request STRING,

 >   status STRING,

 >   size STRING,

 >   referer STRING,

 >   agent STRING)

 >   ROW FORMAT SERDE

'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

 >   WITH SERDEPROPERTIES (

 >  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\])

([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^
\"]*|\"[^\"]*\"))?",

 >   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s

%7$s %8$s %9$s"

 >   )

 >   STORED AS TEXTFILE

 >   LOCATION

'/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033';

FAILED: Error in metadata:

MetaException(message:hdfs://
h1.vgs.mypoints.com:8020/user/flume/events/request_logs/ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

is not a directory or unable to create one)

FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask





In HDFS: file exists



hadoop fs -ls

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033

Found 1 items

-rw-r--r--   3 hdfs supergroup 2242037226 2013-06-13 11:14

/user/flume/events/request_logs/
ar1.vgs.mypoints.com/13-06-13/FlumeData.1371144648033



I've download the serde2 jar file too and install it in
/usr/lib/hive/lib/hive-json-serde-0.2.jar and I've bounced all the hadoop
services after that.



I even added the jar file manually in hive and run the above sql but still
failing.

ive> add jar /usr/lib/hive/lib/hive-json-serde-0.2.jar

 > ;

Added /usr/lib/hive/lib/hive-json-serde-0.2.jar to class path Added
resource: /usr/lib/hive/lib/hive-json-serde-0.2.jar



Any help would be highly appreciable.



-Sanjeev









-- 
Sanjeev Sagar

*"**Separate yourself from everything that separates you from others
!" - Nirankari
Baba Hardev Singh ji *

**


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Ramki Palle
In the *Attempt two, *are you not supposed to use "hivetry" as the
directory?

May be you should try giving the full path "
/opt/am/ver/1.0/hive/hivetry/classifier_wf.py" and see if it works.

Regards,
Ramki.


On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch  wrote:

>
> Stephen:  would you be willing to share an example of specifying a
> "directory" as the  add "file" target?I have not seen this working
>
> I have attempted to use it as follows:
>
> *We will access a script within the "hivetry" directory located here:*
> hive> ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
> -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
> /opt/am/ver/1.0/hive/hivetry/classifier_wf.py
>
> *Add the directory  to hive:*
> hive> add file /opt/am/ver/1.0/hive/hivetry;
> Added resource: /opt/am/ver/1.0/hive/hivetry
>
> *Attempt to run transform query using that script:*
> *
> *
> *Attempt one: use the script name unqualified:*
>
> hive>from (select transform (aappname,qappname) using 'classifier_wf.py' 
> as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c 
> select o.aappname2, o.qappname2;
>
> (Failed:   Caused by: java.io.IOException: Cannot run program 
> "classifier_wf.py": java.io.IOException: error=2, No such file or directory)
>
>
> *Attempt two: use the script name with the directory name prefix: *
>
> hive>from (select transform (aappname,qappname) using 
> 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o 
> insert overwrite table c select o.aappname2, o.qappname2;
>
> (Failed:   Caused by: java.io.IOException: Cannot run program 
> "hive/classifier_wf.py": java.io.IOException: error=2, No such file or 
> directory)
>
>
>
>
> 2013/6/20 Stephen Sprague 
>
>> yeah.  the archive isn't unpacked on the remote side. I think add archive
>> is mostly used for finding java packages since CLASSPATH will reference the
>> archive (and as such there is no need to expand it.)
>>
>>
>> On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch wrote:
>>
>>> thx for the tip on "add " where  is directory. I will try
>>> that.
>>>
>>>
>>> 2013/6/20 Stephen Sprague 
>>>
 i personally only know of adding a .jar file via add archive but my
 experience there is very limited.  i believe if you 'add file' and the file
 is a directory it'll recursively take everything underneath but i know of
 nothing that inflates or un tars things on the remote end automatically.

 i would 'add file' your python script and then within that untar your
 tarball to get at your model data. its just the matter of figuring out the
 path to that tarball that's kinda up in the air when its added as 'add
 file'.  Yeah. "local downlooads directory".  What's the literal path is
 what i'd like to know. :)


 On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch wrote:

>
> @Stephen:  given the  'relative' path for hive is from a local
> downloads directory on each local tasktracker in the cluster,  it was my
> thought that if the archive were actually being expanded then
> somedir/somefileinthearchive  should work.  I will go ahead and test this
> assumption.
>
> In the meantime, is there any facility available in hive for making
> archived files available to hive jobs?  archive or hadoop archive ("har")
> etc?
>
>
> 2013/6/20 Stephen Sprague 
>
>> what would be interesting would be to run a little experiment and
>> find out what the default PATH is on your data nodes.  How much of a pain
>> would it be to run a little python script to print to stderr the value of
>> the environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>>
>> that's of course going through normal channels of "add file".
>>
>> the thing is given you're using a relative path "hive/parse_qx.py"
>> you need to know what the "current directory" is when the process runs on
>> the data nodes.
>>
>>
>>
>>
>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch wrote:
>>
>>>
>>> We have a few dozen files that need to be made available to all
>>> mappers/reducers in the cluster while running  hive transformation 
>>> steps .
>>>
>>> It seems the "add archive"  does not make the entries unarchived and
>>> thus available directly on the default file path - and that is what we 
>>> are
>>> looking for.
>>>
>>> To illustrate:
>>>
>>>add file modelfile.1;
>>>add file modelfile.2;
>>>..
>>> add file modelfile.N;
>>>
>>>   Then, our model that is invoked during the transformation step *does
>>> *have correct access to its model files in the defaul path.
>>>
>>> But .. those model files take low *minutes* to all load..
>>>
>>> instead when we try:
>>>add archive  modelArchive.tgz.
>>>
>>> The problem is the archive does not get exploded apparently ..

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
Stephen:  would you be willing to share an example of specifying a
"directory" as the  add "file" target?I have not seen this working

I have attempted to use it as follows:

*We will access a script within the "hivetry" directory located here:*
hive> ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
-rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
/opt/am/ver/1.0/hive/hivetry/classifier_wf.py

*Add the directory  to hive:*
hive> add file /opt/am/ver/1.0/hive/hivetry;
Added resource: /opt/am/ver/1.0/hive/hivetry

*Attempt to run transform query using that script:*
*
*
*Attempt one: use the script name unqualified:*

hive>from (select transform (aappname,qappname) using
'classifier_wf.py' as (aappname2 string, qappname2 string) from eqx )
o insert overwrite table c select o.aappname2, o.qappname2;

(Failed:   Caused by: java.io.IOException: Cannot run program
"classifier_wf.py": java.io.IOException: error=2, No such file or
directory)


*Attempt two: use the script name with the directory name prefix: *
hive>from (select transform (aappname,qappname) using
'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

(Failed:   Caused by: java.io.IOException: Cannot run program
"hive/classifier_wf.py": java.io.IOException: error=2, No such file or
directory)




2013/6/20 Stephen Sprague 

> yeah.  the archive isn't unpacked on the remote side. I think add archive
> is mostly used for finding java packages since CLASSPATH will reference the
> archive (and as such there is no need to expand it.)
>
>
> On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch  wrote:
>
>> thx for the tip on "add " where  is directory. I will try
>> that.
>>
>>
>> 2013/6/20 Stephen Sprague 
>>
>>> i personally only know of adding a .jar file via add archive but my
>>> experience there is very limited.  i believe if you 'add file' and the file
>>> is a directory it'll recursively take everything underneath but i know of
>>> nothing that inflates or un tars things on the remote end automatically.
>>>
>>> i would 'add file' your python script and then within that untar your
>>> tarball to get at your model data. its just the matter of figuring out the
>>> path to that tarball that's kinda up in the air when its added as 'add
>>> file'.  Yeah. "local downlooads directory".  What's the literal path is
>>> what i'd like to know. :)
>>>
>>>
>>> On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch wrote:
>>>

 @Stephen:  given the  'relative' path for hive is from a local
 downloads directory on each local tasktracker in the cluster,  it was my
 thought that if the archive were actually being expanded then
 somedir/somefileinthearchive  should work.  I will go ahead and test this
 assumption.

 In the meantime, is there any facility available in hive for making
 archived files available to hive jobs?  archive or hadoop archive ("har")
 etc?


 2013/6/20 Stephen Sprague 

> what would be interesting would be to run a little experiment and find
> out what the default PATH is on your data nodes.  How much of a pain would
> it be to run a little python script to print to stderr the value of the
> environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>
> that's of course going through normal channels of "add file".
>
> the thing is given you're using a relative path "hive/parse_qx.py"
> you need to know what the "current directory" is when the process runs on
> the data nodes.
>
>
>
>
> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch wrote:
>
>>
>> We have a few dozen files that need to be made available to all
>> mappers/reducers in the cluster while running  hive transformation steps 
>> .
>>
>> It seems the "add archive"  does not make the entries unarchived and
>> thus available directly on the default file path - and that is what we 
>> are
>> looking for.
>>
>> To illustrate:
>>
>>add file modelfile.1;
>>add file modelfile.2;
>>..
>> add file modelfile.N;
>>
>>   Then, our model that is invoked during the transformation step *does
>> *have correct access to its model files in the defaul path.
>>
>> But .. those model files take low *minutes* to all load..
>>
>> instead when we try:
>>add archive  modelArchive.tgz.
>>
>> The problem is the archive does not get exploded apparently ..
>>
>> I have an archive for example that contains shell scripts under the
>> "hive" directory stored inside.  I am *not *able to access
>> hive/my-shell-script.sh  after adding the archive. Specifically the
>> following fails:
>>
>> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
>> -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
>> appminer/bin/launch-quixey_to_xml.sh
>>
>> fr

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Sprague
yeah.  the archive isn't unpacked on the remote side. I think add archive
is mostly used for finding java packages since CLASSPATH will reference the
archive (and as such there is no need to expand it.)


On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch  wrote:

> thx for the tip on "add " where  is directory. I will try that.
>
>
> 2013/6/20 Stephen Sprague 
>
>> i personally only know of adding a .jar file via add archive but my
>> experience there is very limited.  i believe if you 'add file' and the file
>> is a directory it'll recursively take everything underneath but i know of
>> nothing that inflates or un tars things on the remote end automatically.
>>
>> i would 'add file' your python script and then within that untar your
>> tarball to get at your model data. its just the matter of figuring out the
>> path to that tarball that's kinda up in the air when its added as 'add
>> file'.  Yeah. "local downlooads directory".  What's the literal path is
>> what i'd like to know. :)
>>
>>
>> On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch wrote:
>>
>>>
>>> @Stephen:  given the  'relative' path for hive is from a local downloads
>>> directory on each local tasktracker in the cluster,  it was my thought that
>>> if the archive were actually being expanded then
>>> somedir/somefileinthearchive  should work.  I will go ahead and test this
>>> assumption.
>>>
>>> In the meantime, is there any facility available in hive for making
>>> archived files available to hive jobs?  archive or hadoop archive ("har")
>>> etc?
>>>
>>>
>>> 2013/6/20 Stephen Sprague 
>>>
 what would be interesting would be to run a little experiment and find
 out what the default PATH is on your data nodes.  How much of a pain would
 it be to run a little python script to print to stderr the value of the
 environmental variable $PATH and $PWD (or the shell command 'pwd') ?

 that's of course going through normal channels of "add file".

 the thing is given you're using a relative path "hive/parse_qx.py"  you
 need to know what the "current directory" is when the process runs on the
 data nodes.




 On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch wrote:

>
> We have a few dozen files that need to be made available to all
> mappers/reducers in the cluster while running  hive transformation steps .
>
> It seems the "add archive"  does not make the entries unarchived and
> thus available directly on the default file path - and that is what we are
> looking for.
>
> To illustrate:
>
>add file modelfile.1;
>add file modelfile.2;
>..
> add file modelfile.N;
>
>   Then, our model that is invoked during the transformation step *does
> *have correct access to its model files in the defaul path.
>
> But .. those model files take low *minutes* to all load..
>
> instead when we try:
>add archive  modelArchive.tgz.
>
> The problem is the archive does not get exploded apparently ..
>
> I have an archive for example that contains shell scripts under the
> "hive" directory stored inside.  I am *not *able to access
> hive/my-shell-script.sh  after adding the archive. Specifically the
> following fails:
>
> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
> -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
> appminer/bin/launch-quixey_to_xml.sh
>
> from (select transform (aappname,qappname)
> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
> from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;
>
> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No 
> such file or directory
>
>
>
>

>>>
>>
>


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
thx for the tip on "add " where  is directory. I will try that.


2013/6/20 Stephen Sprague 

> i personally only know of adding a .jar file via add archive but my
> experience there is very limited.  i believe if you 'add file' and the file
> is a directory it'll recursively take everything underneath but i know of
> nothing that inflates or un tars things on the remote end automatically.
>
> i would 'add file' your python script and then within that untar your
> tarball to get at your model data. its just the matter of figuring out the
> path to that tarball that's kinda up in the air when its added as 'add
> file'.  Yeah. "local downlooads directory".  What's the literal path is
> what i'd like to know. :)
>
>
> On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch  wrote:
>
>>
>> @Stephen:  given the  'relative' path for hive is from a local downloads
>> directory on each local tasktracker in the cluster,  it was my thought that
>> if the archive were actually being expanded then
>> somedir/somefileinthearchive  should work.  I will go ahead and test this
>> assumption.
>>
>> In the meantime, is there any facility available in hive for making
>> archived files available to hive jobs?  archive or hadoop archive ("har")
>> etc?
>>
>>
>> 2013/6/20 Stephen Sprague 
>>
>>> what would be interesting would be to run a little experiment and find
>>> out what the default PATH is on your data nodes.  How much of a pain would
>>> it be to run a little python script to print to stderr the value of the
>>> environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>>>
>>> that's of course going through normal channels of "add file".
>>>
>>> the thing is given you're using a relative path "hive/parse_qx.py"  you
>>> need to know what the "current directory" is when the process runs on the
>>> data nodes.
>>>
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch wrote:
>>>

 We have a few dozen files that need to be made available to all
 mappers/reducers in the cluster while running  hive transformation steps .

 It seems the "add archive"  does not make the entries unarchived and
 thus available directly on the default file path - and that is what we are
 looking for.

 To illustrate:

add file modelfile.1;
add file modelfile.2;
..
 add file modelfile.N;

   Then, our model that is invoked during the transformation step *does
 *have correct access to its model files in the defaul path.

 But .. those model files take low *minutes* to all load..

 instead when we try:
add archive  modelArchive.tgz.

 The problem is the archive does not get exploded apparently ..

 I have an archive for example that contains shell scripts under the
 "hive" directory stored inside.  I am *not *able to access
 hive/my-shell-script.sh  after adding the archive. Specifically the
 following fails:

 $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
 -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
 appminer/bin/launch-quixey_to_xml.sh

 from (select transform (aappname,qappname)
 *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
 from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

 Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No 
 such file or directory




>>>
>>
>


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Sprague
i personally only know of adding a .jar file via add archive but my
experience there is very limited.  i believe if you 'add file' and the file
is a directory it'll recursively take everything underneath but i know of
nothing that inflates or un tars things on the remote end automatically.

i would 'add file' your python script and then within that untar your
tarball to get at your model data. its just the matter of figuring out the
path to that tarball that's kinda up in the air when its added as 'add
file'.  Yeah. "local downlooads directory".  What's the literal path is
what i'd like to know. :)


On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch  wrote:

>
> @Stephen:  given the  'relative' path for hive is from a local downloads
> directory on each local tasktracker in the cluster,  it was my thought that
> if the archive were actually being expanded then
> somedir/somefileinthearchive  should work.  I will go ahead and test this
> assumption.
>
> In the meantime, is there any facility available in hive for making
> archived files available to hive jobs?  archive or hadoop archive ("har")
> etc?
>
>
> 2013/6/20 Stephen Sprague 
>
>> what would be interesting would be to run a little experiment and find
>> out what the default PATH is on your data nodes.  How much of a pain would
>> it be to run a little python script to print to stderr the value of the
>> environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>>
>> that's of course going through normal channels of "add file".
>>
>> the thing is given you're using a relative path "hive/parse_qx.py"  you
>> need to know what the "current directory" is when the process runs on the
>> data nodes.
>>
>>
>>
>>
>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch wrote:
>>
>>>
>>> We have a few dozen files that need to be made available to all
>>> mappers/reducers in the cluster while running  hive transformation steps .
>>>
>>> It seems the "add archive"  does not make the entries unarchived and
>>> thus available directly on the default file path - and that is what we are
>>> looking for.
>>>
>>> To illustrate:
>>>
>>>add file modelfile.1;
>>>add file modelfile.2;
>>>..
>>> add file modelfile.N;
>>>
>>>   Then, our model that is invoked during the transformation step *does *have
>>> correct access to its model files in the defaul path.
>>>
>>> But .. those model files take low *minutes* to all load..
>>>
>>> instead when we try:
>>>add archive  modelArchive.tgz.
>>>
>>> The problem is the archive does not get exploded apparently ..
>>>
>>> I have an archive for example that contains shell scripts under the
>>> "hive" directory stored inside.  I am *not *able to access
>>> hive/my-shell-script.sh  after adding the archive. Specifically the
>>> following fails:
>>>
>>> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
>>> -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
>>> appminer/bin/launch-quixey_to_xml.sh
>>>
>>> from (select transform (aappname,qappname)
>>> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
>>> from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;
>>>
>>> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No 
>>> such file or directory
>>>
>>>
>>>
>>>
>>
>


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
To demonstrate this is not necessarily a path issue - but instead an issue
with the "archive" is not unpacked  -  I have created a zip file containing
a python script in its root directory.  The archive is added to hive and
then an attempt is made to invoke the python script within a transform
query. But we get a "file not found" from the map Task - indicating that
the archive is not being exploded.

Show that the python script "classifier_wf.py" is resident in the
*root *directory
of the zip file:
e$ jar -tvf py.zip | grep classifier_wf.py
 11241 Tue Jun 18 19:37:02 UTC 2013 classifier_wf.py

Add the archive to hive:
   hive> add archive /opt/am/ver/1.0/hive/py.zip;
   Added resource: /opt/am/ver/1.0/hive/py.zip

Run a transform query:

  hive>from (select transform (aappname,qappname) using
'classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o
insert overwrite table c select o.aappname2, o.qappname2;

Get an error:   ;)

Check the logs:

Caused by: java.io.IOException: Cannot run program "classifier_wf.py":
java.io.IOException: error=2, No such file or directory




2013/6/20 Stephen Boesch 

>
> @Stephen:  given the  'relative' path for hive is from a local downloads
> directory on each local tasktracker in the cluster,  it was my thought that
> if the archive were actually being expanded then
> somedir/somefileinthearchive  should work.  I will go ahead and test this
> assumption.
>
> In the meantime, is there any facility available in hive for making
> archived files available to hive jobs?  archive or hadoop archive ("har")
> etc?
>
>
> 2013/6/20 Stephen Sprague 
>
>> what would be interesting would be to run a little experiment and find
>> out what the default PATH is on your data nodes.  How much of a pain would
>> it be to run a little python script to print to stderr the value of the
>> environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>>
>> that's of course going through normal channels of "add file".
>>
>> the thing is given you're using a relative path "hive/parse_qx.py"  you
>> need to know what the "current directory" is when the process runs on the
>> data nodes.
>>
>>
>>
>>
>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch wrote:
>>
>>>
>>> We have a few dozen files that need to be made available to all
>>> mappers/reducers in the cluster while running  hive transformation steps .
>>>
>>> It seems the "add archive"  does not make the entries unarchived and
>>> thus available directly on the default file path - and that is what we are
>>> looking for.
>>>
>>> To illustrate:
>>>
>>>add file modelfile.1;
>>>add file modelfile.2;
>>>..
>>> add file modelfile.N;
>>>
>>>   Then, our model that is invoked during the transformation step *does *have
>>> correct access to its model files in the defaul path.
>>>
>>> But .. those model files take low *minutes* to all load..
>>>
>>> instead when we try:
>>>add archive  modelArchive.tgz.
>>>
>>> The problem is the archive does not get exploded apparently ..
>>>
>>> I have an archive for example that contains shell scripts under the
>>> "hive" directory stored inside.  I am *not *able to access
>>> hive/my-shell-script.sh  after adding the archive. Specifically the
>>> following fails:
>>>
>>> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
>>> -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
>>> appminer/bin/launch-quixey_to_xml.sh
>>>
>>> from (select transform (aappname,qappname)
>>> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
>>> from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;
>>>
>>> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No 
>>> such file or directory
>>>
>>>
>>>
>>>
>>
>


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
@Stephen:  given the  'relative' path for hive is from a local downloads
directory on each local tasktracker in the cluster,  it was my thought that
if the archive were actually being expanded then
somedir/somefileinthearchive  should work.  I will go ahead and test this
assumption.

In the meantime, is there any facility available in hive for making
archived files available to hive jobs?  archive or hadoop archive ("har")
etc?


2013/6/20 Stephen Sprague 

> what would be interesting would be to run a little experiment and find out
> what the default PATH is on your data nodes.  How much of a pain would it
> be to run a little python script to print to stderr the value of the
> environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>
> that's of course going through normal channels of "add file".
>
> the thing is given you're using a relative path "hive/parse_qx.py"  you
> need to know what the "current directory" is when the process runs on the
> data nodes.
>
>
>
>
> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch  wrote:
>
>>
>> We have a few dozen files that need to be made available to all
>> mappers/reducers in the cluster while running  hive transformation steps .
>>
>> It seems the "add archive"  does not make the entries unarchived and thus
>> available directly on the default file path - and that is what we are
>> looking for.
>>
>> To illustrate:
>>
>>add file modelfile.1;
>>add file modelfile.2;
>>..
>> add file modelfile.N;
>>
>>   Then, our model that is invoked during the transformation step *does *have
>> correct access to its model files in the defaul path.
>>
>> But .. those model files take low *minutes* to all load..
>>
>> instead when we try:
>>add archive  modelArchive.tgz.
>>
>> The problem is the archive does not get exploded apparently ..
>>
>> I have an archive for example that contains shell scripts under the
>> "hive" directory stored inside.  I am *not *able to access
>> hive/my-shell-script.sh  after adding the archive. Specifically the
>> following fails:
>>
>> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
>> -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
>> appminer/bin/launch-quixey_to_xml.sh
>>
>> from (select transform (aappname,qappname)
>> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string)
>> from eqx ) o insert overwrite table c select o.aappname2, o.qappname2;
>>
>> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No such 
>> file or directory
>>
>>
>>
>>
>


Re: Question regarding nested complex data type

2013-06-20 Thread Stephen Sprague
you only get three.  field separator, array elements separator (aka
collection delimiter), and map key/value separator (aka map key
delimiter).

when you  nest deeper then you gotta use the default '^D', '^E' etc for
each level.  At least that's been my experience which i've found has worked
successfully.


On Thu, Jun 20, 2013 at 7:45 AM, neha  wrote:

> Thanks a lot for your reply, Stephen.
> To answer your question - I was not aware of the fact that we could use
> delimiter (in my example, '|') for first level of nesting. I tried now and
> it worked fine.
>
> My next question - Is there any way to provide delimiter in DDL for second
> level of nesting?
> Thanks again!!
>
>
> On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague wrote:
>
>> its all there in the documentation under "create table" and it seems you
>> got everything right too except one little thing - in your second example
>> there for 'sample data loaded' - instead of '^B' change that to '|'  and
>> you should be good. That's the delimiter that separates your two array
>> elements - ie collections.
>>
>> i guess the real question for me is when you say 'since there is no way
>> to use given delimiter "|" ' what did you mean by that?
>>
>>
>>
>> On Thu, Jun 20, 2013 at 1:42 AM, neha  wrote:
>>
>>> Hi All,
>>>
>>> I have 2 questions about complex data types in nested composition.
>>>
>>> 1 >> I did not find a way to provide delimiter information in DDL if one
>>> or more column has nested array/struct. In this case, default delimiter has
>>> to be used for complex type column.
>>> Please let me know if this is a limitation as of now or I am missing
>>> something.
>>>
>>> e.g.:
>>> *DDL*:
>>> hive> create table example(col1 int, col2
>>> array>) row format delimited fields terminated
>>> by ',';
>>> OK
>>> Time taken: 0.226 seconds
>>>
>>> *Sample data loaded:*
>>> 1,1^Cstring1^B2^Cstring2
>>>
>>> *O/P:*
>>> hive> select * from example;
>>> OK
>>> 1[{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
>>> Time taken: 0.288 seconds
>>>
>>> 2 >> For the same DDL given above, if we provide clause* collection
>>> items terminated by '|' *and still use default delimiters (since there
>>> is no way to use given delimiter '|') then the select query shows incorrect
>>> data.
>>> Please let me know if this is something expected.
>>>
>>> e.g.
>>> *DDL*:
>>> hive> create table example(col1 int, col2
>>> array>) row format delimited fields terminated
>>> by ',' collection items terminated by '|';
>>> OK
>>> Time taken: 0.175 seconds
>>>
>>> *Sample data loaded:*
>>> 1,1^Cstring1^B2^Cstring2
>>>
>>> *O/P:
>>> *hive> select * from
>>> example;
>>>
>>> OK
>>> 1[{"st1":1,"st2":"string1\u00022"}]
>>> Time taken: 0.141 seconds
>>> **
>>> Thanks & Regards.
>>>
>>
>>
>


Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Sprague
what would be interesting would be to run a little experiment and find out
what the default PATH is on your data nodes.  How much of a pain would it
be to run a little python script to print to stderr the value of the
environmental variable $PATH and $PWD (or the shell command 'pwd') ?

that's of course going through normal channels of "add file".

the thing is given you're using a relative path "hive/parse_qx.py"  you
need to know what the "current directory" is when the process runs on the
data nodes.




On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch  wrote:

>
> We have a few dozen files that need to be made available to all
> mappers/reducers in the cluster while running  hive transformation steps .
>
> It seems the "add archive"  does not make the entries unarchived and thus
> available directly on the default file path - and that is what we are
> looking for.
>
> To illustrate:
>
>add file modelfile.1;
>add file modelfile.2;
>..
> add file modelfile.N;
>
>   Then, our model that is invoked during the transformation step *does *have
> correct access to its model files in the defaul path.
>
> But .. those model files take low *minutes* to all load..
>
> instead when we try:
>add archive  modelArchive.tgz.
>
> The problem is the archive does not get exploded apparently ..
>
> I have an archive for example that contains shell scripts under the "hive"
> directory stored inside.  I am *not *able to access
> hive/my-shell-script.sh  after adding the archive. Specifically the
> following fails:
>
> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
> -rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
> appminer/bin/launch-quixey_to_xml.sh
>
> from (select transform (aappname,qappname)
> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from
> eqx ) o insert overwrite table c select o.aappname2, o.qappname2;
>
> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No such 
> file or directory
>
>
>
>


Re: Question regarding nested complex data type

2013-06-20 Thread neha
Thanks a lot for your reply, Stephen.
To answer your question - I was not aware of the fact that we could use
delimiter (in my example, '|') for first level of nesting. I tried now and
it worked fine.

My next question - Is there any way to provide delimiter in DDL for second
level of nesting?
Thanks again!!

On Thu, Jun 20, 2013 at 8:02 PM, Stephen Sprague  wrote:

> its all there in the documentation under "create table" and it seems you
> got everything right too except one little thing - in your second example
> there for 'sample data loaded' - instead of '^B' change that to '|'  and
> you should be good. That's the delimiter that separates your two array
> elements - ie collections.
>
> i guess the real question for me is when you say 'since there is no way to
> use given delimiter "|" ' what did you mean by that?
>
>
>
> On Thu, Jun 20, 2013 at 1:42 AM, neha  wrote:
>
>> Hi All,
>>
>> I have 2 questions about complex data types in nested composition.
>>
>> 1 >> I did not find a way to provide delimiter information in DDL if one
>> or more column has nested array/struct. In this case, default delimiter has
>> to be used for complex type column.
>> Please let me know if this is a limitation as of now or I am missing
>> something.
>>
>> e.g.:
>> *DDL*:
>> hive> create table example(col1 int, col2
>> array>) row format delimited fields terminated
>> by ',';
>> OK
>> Time taken: 0.226 seconds
>>
>> *Sample data loaded:*
>> 1,1^Cstring1^B2^Cstring2
>>
>> *O/P:*
>> hive> select * from example;
>> OK
>> 1[{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
>> Time taken: 0.288 seconds
>>
>> 2 >> For the same DDL given above, if we provide clause* collection
>> items terminated by '|' *and still use default delimiters (since there
>> is no way to use given delimiter '|') then the select query shows incorrect
>> data.
>> Please let me know if this is something expected.
>>
>> e.g.
>> *DDL*:
>> hive> create table example(col1 int, col2
>> array>) row format delimited fields terminated
>> by ',' collection items terminated by '|';
>> OK
>> Time taken: 0.175 seconds
>>
>> *Sample data loaded:*
>> 1,1^Cstring1^B2^Cstring2
>>
>> *O/P:
>> *hive> select * from
>> example;
>>
>> OK
>> 1[{"st1":1,"st2":"string1\u00022"}]
>> Time taken: 0.141 seconds
>> **
>> Thanks & Regards.
>>
>
>


Re: Question regarding nested complex data type

2013-06-20 Thread Stephen Sprague
its all there in the documentation under "create table" and it seems you
got everything right too except one little thing - in your second example
there for 'sample data loaded' - instead of '^B' change that to '|'  and
you should be good. That's the delimiter that separates your two array
elements - ie collections.

i guess the real question for me is when you say 'since there is no way to
use given delimiter "|" ' what did you mean by that?



On Thu, Jun 20, 2013 at 1:42 AM, neha  wrote:

> Hi All,
>
> I have 2 questions about complex data types in nested composition.
>
> 1 >> I did not find a way to provide delimiter information in DDL if one
> or more column has nested array/struct. In this case, default delimiter has
> to be used for complex type column.
> Please let me know if this is a limitation as of now or I am missing
> something.
>
> e.g.:
> *DDL*:
> hive> create table example(col1 int, col2
> array>) row format delimited fields terminated
> by ',';
> OK
> Time taken: 0.226 seconds
>
> *Sample data loaded:*
> 1,1^Cstring1^B2^Cstring2
>
> *O/P:*
> hive> select * from example;
> OK
> 1[{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
> Time taken: 0.288 seconds
>
> 2 >> For the same DDL given above, if we provide clause* collection items
> terminated by '|' *and still use default delimiters (since there is no
> way to use given delimiter '|') then the select query shows incorrect data.
> Please let me know if this is something expected.
>
> e.g.
> *DDL*:
> hive> create table example(col1 int, col2
> array>) row format delimited fields terminated
> by ',' collection items terminated by '|';
> OK
> Time taken: 0.175 seconds
>
> *Sample data loaded:*
> 1,1^Cstring1^B2^Cstring2
>
> *O/P:
> *hive> select * from
> example;
>
> OK
> 1[{"st1":1,"st2":"string1\u00022"}]
> Time taken: 0.141 seconds
> **
> Thanks & Regards.
>


Re: Hive select shows null after successful data load

2013-06-20 Thread Stephen Sprague
hooray!   over one hurdle and onto the next one.   So something about that
one nested array caused the problem.  very strange. I wonder if there is a
smaller test case to look at as it seems not all arrays break it since i
see one for the attribute "values".

As to the formatting issue i don't believe the native hive client has much
to offer there. its bare bones and record oriented.   beeline seems to
another opensource hive client which looks to have more options you might
have a gander at that though i don't think it has anything special for
pretty printing arrays, maps or structs but i could be wrong.

And then of course nothing stopping you though from exploring piping that
gnarly stuff into python (or whatever) and have it come out the other end
all nice and pretty -- and then posting that here. :)


On Wed, Jun 19, 2013 at 7:54 PM, Sunita Arvind wrote:

> Finally I could get it work. The issue resolves once I remove the arrays
> within position structure. So that is the limitation of the serde. I
> changed 'industries' to string and 'jobfunctions' to Map I
> can query the table just fine now. Here is the complete DDL for reference:
>
> create external table linkedin_Jobsearch (
>
> jobs STRUCT<
> values : ARRAY company : STRUCT<
> id : STRING,
> name : STRING>,
> postingDate : STRUCT<
> year : STRING,
> day : STRING,
> month : STRING>,
> descriptionSnippet : STRING,
> expirationDate : STRUCT<
> year : STRING,
> day : STRING,
> month : STRING>,
> position : STRUCT<
> jobFunctions : MAP,--these were arrays of structure
> in my previous attempts
> industries : STRING,
> title : STRING,
>
> jobType : STRUCT<
> code : STRING,
> name : STRING>,
> experienceLevel : STRUCT<
> code : STRING,
> name : STRING>>,
> id : STRING,
> customerJobCode : STRING,
> skillsAndExperience : STRING,
> salary : STRING,
> jobPoster : STRUCT<
> id : STRING,
> firstName : STRING,
> lastName : STRING,
> headline : STRING>,
> referralBonus : STRING,
> locationDescription : STRING>>>
> )
> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
> LOCATION '/user/sunita/tables/jobs';
>
> Thanks Stephen for sharing your thoughts. It helped.
>
> Also if someone /Stephen could help me display this information in a
> useful manner, that would be great. Right now all the values show up as
> arrays. Here is what I mean:
> For a query like this:
> hive> select jobs.values.company.name, jobs.values.position.title,
> jobs.values.locationdescription from linkedin_jobsearch;
>
> This is the output:
>
> ["CyberCoders","CyberCoders","CyberCoders","Management Science
> Associates","Google","Google","CyberCoders","CyberCoders","HP","Sigmaways","Global
> Data Consultancy","Global Data
> Consultancy","CyberCoders","CyberCoders","CyberCoders","VMware","CD IT
> Recruitment","CD IT Recruitment","Digital Reasoning Systems","AOL"]
> ["Software Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics","Software
> Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics","Software
> Engineer-Hadoop, HDFS, HBase, Pig- Vertica Analytics","Data
> Architect","Systems Engineer, Site Reliability Engineering","Systems
> Engineer, Site Reliability Engineering","NoSQL Engineer - MongoDB for big
> data, web crawling - RELO OFFER","NoSQL Engineer - MongoDB for big data,
> web crawling - RELO OFFER","Hadoop Database Administrator Medicare","Hadoop
> / Big Data Consultant","Lead Hadoop developer","Head of Big Data -
> Hadoop","Hadoop Engineer - Hadoop, Operations, Linux Admin, Java,
> Storage","Sr. Hadoop Administrator - Hadoop, MapReduce, HDFS","Sr. Hadoop
> Administrator - Hadoop, MapReduce, HDFS","Software Engineer - Big
> Data","Hadoop Team Lead Consultant - Global Leader in Big Data
> solutions","Hadoop Administrator Consultant - Global Leader in Big Data
> solutions","Java Developer","Sr.Software Engineer-Big Data-Hadoop"]
> ["Pittsburgh, PA","Pittsburgh, PA","Harrisburg, PA","Pittsburgh, PA
> (Shadyside area near Bakery Square)","Pittsburgh, PA, USA","Pittsburgh,
> PA","Cleveland, OH","Akron, OH","Herndon, VA","Cupertino, CA","London,
> United Kingdom","London, United Kingdom","Mountain View, CA","san jose,
> CA","Santa Clara, CA","Palo Alto, CA","Home based - Live anywhere in the UK
> or Benelux","Home based - Live anywhere in the UK or Benelux","Herndon,
> VA","Dulles, VA"]
> Time taken: 8.518 seconds
>
> All company names come into an array, all position titles into another
> array and all locationdescription into yet another array. I cannot map 1
> value to the other.
>
> The below query gives a decent output where individual columns can be
> somewhat mapped:
>
> hive> select jobs.values[0].company.name, jobs.values[0].position.title,
> jobs.values[0].locationdescription from linkedin_jobsearch;
>
> CyberCoders Software Engineer-Hadoop, HDFS, HBase, Pig- Vertica
> Analytics  Pittsburgh, PA
> Time taken: 8.543 seconds
>
> But if I want to get the whole list this does not work. I have tried
> setting Input and output formats and setting serde properties to m

Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

2013-06-20 Thread Stephen Boesch
We have a few dozen files that need to be made available to all
mappers/reducers in the cluster while running  hive transformation steps .

It seems the "add archive"  does not make the entries unarchived and thus
available directly on the default file path - and that is what we are
looking for.

To illustrate:

   add file modelfile.1;
   add file modelfile.2;
   ..
add file modelfile.N;

  Then, our model that is invoked during the transformation step *does *have
correct access to its model files in the defaul path.

But .. those model files take low *minutes* to all load..

instead when we try:
   add archive  modelArchive.tgz.

The problem is the archive does not get exploded apparently ..

I have an archive for example that contains shell scripts under the "hive"
directory stored inside.  I am *not *able to access hive/my-shell-script.sh
 after adding the archive. Specifically the following fails:

$ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
-rwxrwxr-x stephenb/stephenb664 2013-06-18 17:46
appminer/bin/launch-quixey_to_xml.sh

from (select transform (aappname,qappname)
*using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from
eqx ) o insert overwrite table c select o.aappname2, o.qappname2;

Cannot run program "hive/parse_qx.py": java.io.IOException: error=2,
No such file or directory


Re: "show table" throwing strange error

2013-06-20 Thread Mohammad Tariq
Thank you for the response ma'am. It didn't help either.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jun 20, 2013 at 8:43 AM, Sunita Arvind wrote:

> Your issue seems familiar. Try logging out of hive session and re-login.
>
> Sunita
>
>
> On Wed, Jun 19, 2013 at 8:53 PM, Mohammad Tariq wrote:
>
>> Hello list,
>>
>>  I have a hive(0.9.0) setup on my Ubuntu box running
>> hadoop-1.0.4. Everything was going smooth till now. But today when I issued
>> *show tables* I got some strange error on the CLI. Here is the error :
>>
>> hive> show tables;
>> FAILED: Parse Error: line 1:0 character '' not supported here
>> line 1:1 character '' not supported here
>> line 1:2 character '' not supported here
>> line 1:3 character '' not supported here
>> line 1:4 character '' not supported here
>> line 1:5 character '' not supported here
>> line 1:6 character '' not supported here
>> line 1:7 character '' not supported here
>> line 1:8 character '' not supported here
>> line 1:9 character '' not supported here
>> line 1:10 character '' not supported here
>> line 1:11 character '' not supported here
>> line 1:12 character '' not supported here
>> line 1:13 character '' not supported here
>> line 1:14 character '' not supported here
>> line 1:15 character '' not supported here
>> line 1:16 character '' not supported here
>> line 1:17 character '' not supported here
>> line 1:18 character '' not supported here
>> line 1:19 character '' not supported here
>> line 1:20 character '' not supported here
>> line 1:21 character '' not supported here
>> line 1:22 character '' not supported here
>> line 1:23 character '' not supported here
>> line 1:24 character '' not supported here
>> line 1:25 character '' not supported here
>> line 1:26 character '' not supported here
>> line 1:27 character '' not supported here
>> line 1:28 character '' not supported here
>> line 1:29 character '' not supported here
>> line 1:30 character '' not supported here
>> line 1:31 character '' not supported here
>> line 1:32 character '' not supported here
>> line 1:33 character '' not supported here
>> line 1:34 character '' not supported here
>> line 1:35 character '' not supported here
>> line 1:36 character '' not supported here
>> line 1:37 character '' not supported here
>> line 1:38 character '' not supported here
>> line 1:39 character '' not supported here
>> line 1:40 character '' not supported here
>> line 1:41 character '' not supported here
>> line 1:42 character '' not supported here
>> line 1:43 character '' not supported here
>> line 1:44 character '' not supported here
>> line 1:45 character '' not supported here
>> line 1:46 character '' not supported here
>> line 1:47 character '' not supported here
>> line 1:48 character '' not supported here
>> line 1:49 character '' not supported here
>> line 1:50 character '' not supported here
>> line 1:51 character '' not supported here
>> line 1:52 character '' not supported here
>> line 1:53 character '' not supported here
>> line 1:54 character '' not supported here
>> line 1:55 character '' not supported here
>> line 1:56 character '' not supported here
>> line 1:57 character '' not supported here
>> line 1:58 character '' not supported here
>> line 1:59 character '' not supported here
>> line 1:60 character '' not supported here
>> line 1:61 character '' not supported here
>> line 1:62 character '' not supported here
>> line 1:63 character '' not supported here
>> line 1:64 character '' not supported here
>> line 1:65 character '' not supported here
>> line 1:66 character '' not supported here
>> line 1:67 character '' not supported here
>> line 1:68 character '' not supported here
>> line 1:69 character '' not supported here
>> line 1:70 character '' not supported here
>> line 1:71 character '' not supported here
>> line 1:72 character '' not supported here
>> line 1:73 character '' not supported here
>> line 1:74 character '' not supported here
>> line 1:75 character '' not supported here
>> line 1:76 character '' not supported here
>> line 1:77 character '' not supported here
>> line 1:78 character '' not supported here
>> line 1:79 character '' not supported here
>> .
>> .
>> .
>> .
>> .
>> .
>> line 1:378 character '' not supported here
>> line 1:379 character '' not supported here
>> line 1:380 character '' not supported here
>> line 1:381 character '' not supported here
>>
>> Strangely other queries like *select foo from pokes where bar = 'tariq';*are 
>> working fine. Tried to search over the net but could not find anything
>> useful.Need some help.
>>
>> Thank you so much for your time.
>>
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>>
>
>


Question regarding nested complex data type

2013-06-20 Thread neha
Hi All,

I have 2 questions about complex data types in nested composition.

1 >> I did not find a way to provide delimiter information in DDL if one or
more column has nested array/struct. In this case, default delimiter has to
be used for complex type column.
Please let me know if this is a limitation as of now or I am missing
something.

e.g.:
*DDL*:
hive> create table example(col1 int, col2
array>) row format delimited fields terminated
by ',';
OK
Time taken: 0.226 seconds

*Sample data loaded:*
1,1^Cstring1^B2^Cstring2

*O/P:*
hive> select * from example;
OK
1[{"st1":1,"st2":"string1"},{"st1":2,"st2":"string2"}]
Time taken: 0.288 seconds

2 >> For the same DDL given above, if we provide clause* collection items
terminated by '|' *and still use default delimiters (since there is no way
to use given delimiter '|') then the select query shows incorrect data.
Please let me know if this is something expected.

e.g.
*DDL*:
hive> create table example(col1 int, col2
array>) row format delimited fields terminated
by ',' collection items terminated by '|';
OK
Time taken: 0.175 seconds

*Sample data loaded:*
1,1^Cstring1^B2^Cstring2

*O/P:
*hive> select * from
example;

OK
1[{"st1":1,"st2":"string1\u00022"}]
Time taken: 0.141 seconds
**
Thanks & Regards.