Hive Server2 thrift java client

2013-05-24 Thread Ghousia
Hi,

Has anyone explore on thrift java client for HiveServer2?

I have a java client in place which connects to HiveServer2 and gets table
details.

What I am yet to figure out is how to read the actual table content. How to
get handle on table's storage descriptor. ThriftCLIServiceClient do not
provide any methods to work with partitions/databases. Any pointers?

Appreciate your help!

MAny Thanks,
Ghousia.


Re: Difference between like %A% and %a%

2013-05-24 Thread Sai Sai


Just wondering about this, please let me know if you have any suggestions why 
we r getting these results:

This  query does not return any data:

Query1:hive (test) select full_name from states where abbreviation like '%a%';


But this query returns data successfully:

Query2:hive (test) select full_name from states where abbreviation like '%A%';

Result of Query 1:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201305240156_0012, Tracking URL = 
http://ubuntu:50030/jobdetails.jsp?jobid=job_201305240156_0012
Kill Command = /home/satish/work/hadoop-1.0.4/libexec/../bin/hadoop job  -kill 
job_201305240156_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-05-24 03:51:04,939 Stage-1 map = 0%,  reduce = 0%
2013-05-24 03:51:10,970 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 
sec
2013-05-24 03:51:11,983 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 
sec
2013-05-24 03:51:12,988 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 
sec
2013-05-24 03:51:13,995 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 
sec
2013-05-24 03:51:15,004 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 
sec
2013-05-24 03:51:16,013 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 
sec
2013-05-24 03:51:17,020 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.46 
sec
MapReduce Total cumulative CPU time: 460 msec
Ended Job = job_201305240156_0012
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 0.46 sec   HDFS Read: 848 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 460 msec
OK
full_name
Time taken: 19.558 seconds

But this query returns data successfully:

hive (test) select full_name from states where abbreviation like '%A%';

Result of Query2:


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201305240156_0011, Tracking URL = 
http://ubuntu:50030/jobdetails.jsp?jobid=job_201305240156_0011
Kill Command = /home/satish/work/hadoop-1.0.4/libexec/../bin/hadoop job  -kill 
job_201305240156_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-05-24 03:50:32,163 Stage-1 map = 0%,  reduce = 0%
2013-05-24 03:50:38,193 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 
sec
2013-05-24 03:50:39,196 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 
sec
2013-05-24 03:50:40,199 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 
sec
2013-05-24 03:50:41,206 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 
sec
2013-05-24 03:50:42,210 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 
sec
2013-05-24 03:50:43,221 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 
sec
2013-05-24 03:50:44,227 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.47 
sec
MapReduce Total cumulative CPU time: 470 msec
Ended Job = job_201305240156_0011
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 0.47 sec   HDFS Read: 848 HDFS Write: 115 
SUCCESS
Total MapReduce CPU Time Spent: 470 msec
OK
full_name
Alabama
Alaska
Arizona
Arkansas
California
Georgia
Iowa
Louisiana
Massachusetts  
Pennsylvania
Virginia
Washington
Time taken: 20.551 seconds

Thanks
Sai

Re: Difference between like %A% and %a%

2013-05-24 Thread Jov
2013/5/24 Sai Sai saigr...@yahoo.in

 abbreviation l


unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal with
'%a%'.


-- 
Jov
blog: http:amutu.com/blog http://amutu.com/blog


Re: Where can we see the results of Select * from states

2013-05-24 Thread Sai Sai
I have created an external table called states under a database called test,
Then loaded the table successfully;
The i have tried:

Select * from states;

It successfully executes MR and displays the results in the console but 
wondering where to look in hdfs to see these results.

I have looked under all the dirs in filesystem for the below url but cannot see 
the results part file.

http://localhost.localdomain:50070/dfshealth.jsp


Also if i would like the results to save to a specific file from a query how to 
do it?

For Ex: 
    Select * from states  myStates.txt ;
Is there something like this.
Thanks
Sai

Re: Where to find the external table file in HDFS

2013-05-24 Thread Sai Sai
I have created an external table states and loaded it from a file under 
/tmp/states.txt

Then in the url: 

http://localhost.localdomain:50070/dfshealth.jsp

I have looked to see if this file states table exists and do not see it.
Just wondering if it is saved in hdfs or not.

How many days will the files exist under /tmp folder.
Thanks
Sai

Re: Where can we see the results of Select * from states

2013-05-24 Thread Jov
you can write data into filesystem from query using INSERT OVERWRITE
[LOCAL] DIRECTORY directory1 SELECT ... FROM ...

more detail:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintofilesystemfromqueries


2013/5/24 Sai Sai saigr...@yahoo.in

 I have created an external table called states under a database called
 test,
 Then loaded the table successfully;
 The i have tried:

 Select * from states;

 It successfully executes MR and displays the results in the console but
 wondering where to look in hdfs to see these results.

 I have looked under all the dirs in filesystem for the below url but
 cannot see the results part file.

 http://localhost.localdomain:50070/dfshealth.jsp

 Also if i would like the results to save to a specific file from a query
 how to do it?

 For Ex:
 Select * from states  myStates.txt ;
 Is there something like this.
 Thanks
 Sai





-- 
Jov
blog: http:amutu.com/blog http://amutu.com/blog


Re: Difference between like %A% and %a%

2013-05-24 Thread Sai Sai
But it should get more results for this:

%a%

than for

%A%

Please let me know if i am missing something.
Thanks
Sai




 From: Jov am...@amutu.com
To: user@hive.apache.org; Sai Sai saigr...@yahoo.in 
Sent: Friday, 24 May 2013 4:39 PM
Subject: Re: Difference between like %A% and %a%
 




2013/5/24 Sai Sai saigr...@yahoo.in

abbreviation l
unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal with '%a%'.


-- 
Jov

blog: http:amutu.com/blog

Re: How to look at the metadata of the tables we have created.

2013-05-24 Thread Sai Sai
Is it possible to look at the metadata of the databases/tables/views we have 
created in hive.
Is there some thing like sysobjects in hive.
Thanks
Sai

Re: Difference between like %A% and %a%

2013-05-24 Thread John Omernik
I have mentioned this before, and I think this a big miss by the Hive team.
 Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not case
sensitive. Thus when you have new users moving over to Hive, if they see a
command like like they will assume similarity (like many other SQL like
qualities) and thus false negatives may ensue.  Even though it's different
by default (I am ok with this ... I guess, my personal preference is that
it matches the defaults on other systems, and outside of that (which I am,
in in the end fine with, just grumbly :) ) give us the ability to set
that behavior in the hive-site.xml.  That way when an org realizes that it
is different, and their users are all getting false negatives, they can
just update the hive-site and fix the problem rather than have to include
it in training that may or may not work.  I've added this comment to
https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278  for fun.
:)

Please? :)




On Fri, May 24, 2013 at 7:53 AM, Dean Wampler deanwamp...@gmail.com wrote:

 Your where clause looks at the abbreviation, requiring 'A', not the state
 name. You got the correct answer.


 On Fri, May 24, 2013 at 6:21 AM, Sai Sai saigr...@yahoo.in wrote:

 But it should get more results for this:

 %a%

 than for

 %A%

 Please let me know if i am missing something.
 Thanks
 Sai


--
  *From:* Jov am...@amutu.com
 *To:* user@hive.apache.org; Sai Sai saigr...@yahoo.in
 *Sent:* Friday, 24 May 2013 4:39 PM
 *Subject:* Re: Difference between like %A% and %a%


 2013/5/24 Sai Sai saigr...@yahoo.in

 abbreviation l


 unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal with
 '%a%'.


 --
 Jov
 blog: http:amutu.com/blog http://amutu.com/blog





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com



Re: Difference between like %A% and %a%

2013-05-24 Thread Dean Wampler
Hortonworks has announced plans to make Hive more SQL compliant. I suspect
bugs like this will be addressed sooner or later. It will be necessary to
handle backwards compatibility, but that could be handled with a hive
property that enables one or the other behaviors.

On Fri, May 24, 2013 at 8:07 AM, John Omernik j...@omernik.com wrote:

 I have mentioned this before, and I think this a big miss by the Hive
 team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
 case sensitive. Thus when you have new users moving over to Hive, if they
 see a command like like they will assume similarity (like many other SQL
 like qualities) and thus false negatives may ensue.  Even though it's
 different by default (I am ok with this ... I guess, my personal preference
 is that it matches the defaults on other systems, and outside of that
 (which I am, in in the end fine with, just grumbly :) ) give us the ability
 to set that behavior in the hive-site.xml.  That way when an org realizes
 that it is different, and their users are all getting false negatives, they
 can just update the hive-site and fix the problem rather than have to
 include it in training that may or may not work.  I've added this comment
 to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278  for
 fun. :)

 Please? :)




 On Fri, May 24, 2013 at 7:53 AM, Dean Wampler deanwamp...@gmail.comwrote:

 Your where clause looks at the abbreviation, requiring 'A', not the state
 name. You got the correct answer.


 On Fri, May 24, 2013 at 6:21 AM, Sai Sai saigr...@yahoo.in wrote:

 But it should get more results for this:

 %a%

 than for

 %A%

 Please let me know if i am missing something.
 Thanks
 Sai


--
  *From:* Jov am...@amutu.com
 *To:* user@hive.apache.org; Sai Sai saigr...@yahoo.in
 *Sent:* Friday, 24 May 2013 4:39 PM
 *Subject:* Re: Difference between like %A% and %a%


 2013/5/24 Sai Sai saigr...@yahoo.in

 abbreviation l


 unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal
 with '%a%'.


 --
 Jov
 blog: http:amutu.com/blog http://amutu.com/blog





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com


Re: Difference between like %A% and %a%

2013-05-24 Thread Edward Capriolo
It is not really a bug, as must as it is the way hive is designed.

https://issues.apache.org/jira/browse/HIVE-4070#comment-13666362

So there already is a 'like' and an 'rlike', mlike is a good idea. It seems
like an easier UDF (low hanging fruit) type issue anyone could tackle.


On Fri, May 24, 2013 at 9:16 AM, Dean Wampler deanwamp...@gmail.com wrote:

 Hortonworks has announced plans to make Hive more SQL compliant. I suspect
 bugs like this will be addressed sooner or later. It will be necessary to
 handle backwards compatibility, but that could be handled with a hive
 property that enables one or the other behaviors.

 On Fri, May 24, 2013 at 8:07 AM, John Omernik j...@omernik.com wrote:

 I have mentioned this before, and I think this a big miss by the Hive
 team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
 case sensitive. Thus when you have new users moving over to Hive, if they
 see a command like like they will assume similarity (like many other SQL
 like qualities) and thus false negatives may ensue.  Even though it's
 different by default (I am ok with this ... I guess, my personal preference
 is that it matches the defaults on other systems, and outside of that
 (which I am, in in the end fine with, just grumbly :) ) give us the ability
 to set that behavior in the hive-site.xml.  That way when an org realizes
 that it is different, and their users are all getting false negatives, they
 can just update the hive-site and fix the problem rather than have to
 include it in training that may or may not work.  I've added this comment
 to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278  for
 fun. :)

 Please? :)




 On Fri, May 24, 2013 at 7:53 AM, Dean Wampler deanwamp...@gmail.comwrote:

 Your where clause looks at the abbreviation, requiring 'A', not the
 state name. You got the correct answer.


 On Fri, May 24, 2013 at 6:21 AM, Sai Sai saigr...@yahoo.in wrote:

 But it should get more results for this:

 %a%

 than for

 %A%

 Please let me know if i am missing something.
 Thanks
 Sai


--
  *From:* Jov am...@amutu.com
 *To:* user@hive.apache.org; Sai Sai saigr...@yahoo.in
 *Sent:* Friday, 24 May 2013 4:39 PM
 *Subject:* Re: Difference between like %A% and %a%


 2013/5/24 Sai Sai saigr...@yahoo.in

 abbreviation l


 unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal
 with '%a%'.


 --
 Jov
 blog: http:amutu.com/blog http://amutu.com/blog





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com


how to load data from SequenceFile(with Snappy compression) into hive

2013-05-24 Thread Ramesh R N
Hi,

   I had been trying to import data from a sequence-file stored in HDFS,
compressed with Snappy. (the original file is a massive-log file).
  I had created the tables in hive-metastore(MySQL) and installed Snappy
and tried several approaches:
  1. gave the direct path with hdfs:// prefix
  2. tried to download the file and import as a local file
 like
  LOAD DATA LOCAL INPATH 'FlumeData.1362965571811' OVERWRITE INTO TABLE
recordsflume;

  Can somebody shed some light on how to import data from a sequenceFile to
Hive?
  Thanks in advance.

regards
Ramesh


Re: Difference between like %A% and %a%

2013-05-24 Thread Dean Wampler
If backwards compatibility wasn't an issue, the hive code that implements
LIKE could be changed to convert the fields and LIKE strings to lower case
before comparing ;) Of course, there is overhead doing that.

On Fri, May 24, 2013 at 9:50 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Also I am thinking that the rlike is based on regex and can be told to do
 case insensitive matching.


 On Fri, May 24, 2013 at 9:16 AM, Dean Wampler deanwamp...@gmail.comwrote:

 Hortonworks has announced plans to make Hive more SQL compliant. I
 suspect bugs like this will be addressed sooner or later. It will be
 necessary to handle backwards compatibility, but that could be handled with
 a hive property that enables one or the other behaviors.

 On Fri, May 24, 2013 at 8:07 AM, John Omernik j...@omernik.com wrote:

 I have mentioned this before, and I think this a big miss by the Hive
 team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
 case sensitive. Thus when you have new users moving over to Hive, if they
 see a command like like they will assume similarity (like many other SQL
 like qualities) and thus false negatives may ensue.  Even though it's
 different by default (I am ok with this ... I guess, my personal preference
 is that it matches the defaults on other systems, and outside of that
 (which I am, in in the end fine with, just grumbly :) ) give us the ability
 to set that behavior in the hive-site.xml.  That way when an org realizes
 that it is different, and their users are all getting false negatives, they
 can just update the hive-site and fix the problem rather than have to
 include it in training that may or may not work.  I've added this comment
 to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for 
 fun. :)

 Please? :)




 On Fri, May 24, 2013 at 7:53 AM, Dean Wampler deanwamp...@gmail.comwrote:

 Your where clause looks at the abbreviation, requiring 'A', not the
 state name. You got the correct answer.


 On Fri, May 24, 2013 at 6:21 AM, Sai Sai saigr...@yahoo.in wrote:

 But it should get more results for this:

 %a%

 than for

 %A%

 Please let me know if i am missing something.
 Thanks
 Sai


--
  *From:* Jov am...@amutu.com
 *To:* user@hive.apache.org; Sai Sai saigr...@yahoo.in
 *Sent:* Friday, 24 May 2013 4:39 PM
 *Subject:* Re: Difference between like %A% and %a%


 2013/5/24 Sai Sai saigr...@yahoo.in

 abbreviation l


 unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal
 with '%a%'.


 --
 Jov
 blog: http:amutu.com/blog http://amutu.com/blog





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com


Re: Difference between like %A% and %a%

2013-05-24 Thread Edward Capriolo
It is not as simple of a problem as you think. Mysql has the same problem
just most everyone uses a default charset and comparator.

http://www.bluebox.net/about/blog/2009/07/mysql_encoding/

You do you account for foreign characters like the a~ etc. is that  then A
and less then 


On Fri, May 24, 2013 at 11:41 AM, Dean Wampler deanwamp...@gmail.comwrote:

 If backwards compatibility wasn't an issue, the hive code that implements
 LIKE could be changed to convert the fields and LIKE strings to lower case
 before comparing ;) Of course, there is overhead doing that.

 On Fri, May 24, 2013 at 9:50 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Also I am thinking that the rlike is based on regex and can be told to do
 case insensitive matching.


 On Fri, May 24, 2013 at 9:16 AM, Dean Wampler deanwamp...@gmail.comwrote:

 Hortonworks has announced plans to make Hive more SQL compliant. I
 suspect bugs like this will be addressed sooner or later. It will be
 necessary to handle backwards compatibility, but that could be handled with
 a hive property that enables one or the other behaviors.

 On Fri, May 24, 2013 at 8:07 AM, John Omernik j...@omernik.com wrote:

 I have mentioned this before, and I think this a big miss by the Hive
 team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
 case sensitive. Thus when you have new users moving over to Hive, if they
 see a command like like they will assume similarity (like many other SQL
 like qualities) and thus false negatives may ensue.  Even though it's
 different by default (I am ok with this ... I guess, my personal preference
 is that it matches the defaults on other systems, and outside of that
 (which I am, in in the end fine with, just grumbly :) ) give us the ability
 to set that behavior in the hive-site.xml.  That way when an org realizes
 that it is different, and their users are all getting false negatives, they
 can just update the hive-site and fix the problem rather than have to
 include it in training that may or may not work.  I've added this comment
 to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for 
 fun. :)

 Please? :)




 On Fri, May 24, 2013 at 7:53 AM, Dean Wampler deanwamp...@gmail.comwrote:

 Your where clause looks at the abbreviation, requiring 'A', not the
 state name. You got the correct answer.


 On Fri, May 24, 2013 at 6:21 AM, Sai Sai saigr...@yahoo.in wrote:

 But it should get more results for this:

 %a%

 than for

 %A%

 Please let me know if i am missing something.
 Thanks
 Sai


--
  *From:* Jov am...@amutu.com
 *To:* user@hive.apache.org; Sai Sai saigr...@yahoo.in
 *Sent:* Friday, 24 May 2013 4:39 PM
 *Subject:* Re: Difference between like %A% and %a%


 2013/5/24 Sai Sai saigr...@yahoo.in

 abbreviation l


 unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal
 with '%a%'.


 --
 Jov
 blog: http:amutu.com/blog http://amutu.com/blog





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com


OrcFile writing failing with multiple threads

2013-05-24 Thread Andrew Psaltis
All,
I have a test application that is attempting to add rows to an OrcFile from 
multiple threads, however, every time I do I get exceptions with stack traces 
like the following:

java.lang.IndexOutOfBoundsException: Index 4 is outside of 0..5
at org.apache.hadoop.hive.ql.io.orc.DynamicIntArray.get(DynamicIntArray.java:73)
at 
org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:55)
at org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:192)
at org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:199)
at org.apache.hadoop.hive.ql.io.orc.RedBlackTree.add(RedBlackTree.java:300)
at 
org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.add(StringRedBlackTree.java:45)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.write(WriterImpl.java:723)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$MapTreeWriter.write(WriterImpl.java:1093)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:996)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:1450)
at OrcFileTester$BigRowWriter.run(OrcFileTester.java:129)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)


Below is the source code for my sample app that is heavily based on the 
TestOrcFile test case using BigRow. Is there something I am doing wrong here, 
or is this a legitimate bug in the Orc writing?

Thanks in advance,
Andrew


- Java app code follows 
-
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.ql.io.orc.CompressionKind;
import org.apache.hadoop.hive.ql.io.orc.OrcFile;
import org.apache.hadoop.hive.ql.io.orc.Writer;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;

import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;

public class OrcFileTester {

private Writer writer;
private LinkedBlockingQueueBigRow bigRowQueue = new 
LinkedBlockingQueueBigRow();
public OrcFileTester(){

  try{
Path workDir = new Path(System.getProperty(test.tmp.dir,
target + File.separator + test + File.separator + tmp));

Configuration conf;
FileSystem fs;
Path testFilePath;

conf = new Configuration();
fs = FileSystem.getLocal(conf);
testFilePath = new Path(workDir, TestOrcFile.OrcFileTester.orc);
fs.delete(testFilePath, false);


ObjectInspector inspector = 
ObjectInspectorFactory.getReflectionObjectInspector
(BigRow.class, 
ObjectInspectorFactory.ObjectInspectorOptions.JAVA);
writer = OrcFile.createWriter(fs, testFilePath, conf, inspector,
10, CompressionKind.ZLIB, 1, 1);

final ExecutorService bigRowWorkerPool = 
Executors.newFixedThreadPool(10);

//Changing this to more than 1 causes exceptions when writing rows.
for (int i = 0; i  1; i++) {
bigRowWorkerPool.submit(new BigRowWriter());
}
  for(int i =0; i  100; i++){
  if(0 == i % 2){
 bigRowQueue.put(new BigRow(false, (byte) 1, (short) 1024, 
65536,
 Long.MAX_VALUE, (float) 1.0, -15.0, bytes(0,1,2,3,4), 
hi,map(hey,orc)));
  } else{
   bigRowQueue.put(new BigRow(false, null, (short) 1024, 65536,
   Long.MAX_VALUE, (float) 1.0, -15.0, 
bytes(0,1,2,3,4), hi,map(hey,orc)));
  }
  }

  while (!bigRowQueue.isEmpty()) {
  Thread.sleep(2000);
  }
  bigRowWorkerPool.shutdownNow();
  }catch(Exception ex){
  ex.printStackTrace();
  }
}
public void WriteBigRow(){

}

private static MapText, Text map(String... items)  {
MapText, Text result = new HashMapText, Text();
for(String i: items) {
result.put(new Text(i), new Text(i));
}
return result;
}
private static BytesWritable bytes(int... items) {
BytesWritable result = new BytesWritable();
result.setSize(items.length);
for(int i=0; i  items.length; ++i) {
result.getBytes()[i] = (byte) items[i];
}
return 

Re: Difference between like %A% and %a%

2013-05-24 Thread Anthony Urso
Postgres/Vertica and their ilk have ILIKE which is a case-insensitive
version of LIKE, in addition to the case-sensitive LIKE. Works well having
both.

Cheers,
Anthony


On Fri, May 24, 2013 at 8:58 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 It is not as simple of a problem as you think. Mysql has the same problem
 just most everyone uses a default charset and comparator.

 http://www.bluebox.net/about/blog/2009/07/mysql_encoding/

 You do you account for foreign characters like the a~ etc. is that  then
 A and less then 


 On Fri, May 24, 2013 at 11:41 AM, Dean Wampler deanwamp...@gmail.comwrote:

 If backwards compatibility wasn't an issue, the hive code that implements
 LIKE could be changed to convert the fields and LIKE strings to lower case
 before comparing ;) Of course, there is overhead doing that.

 On Fri, May 24, 2013 at 9:50 AM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 Also I am thinking that the rlike is based on regex and can be told to
 do case insensitive matching.


 On Fri, May 24, 2013 at 9:16 AM, Dean Wampler deanwamp...@gmail.comwrote:

 Hortonworks has announced plans to make Hive more SQL compliant. I
 suspect bugs like this will be addressed sooner or later. It will be
 necessary to handle backwards compatibility, but that could be handled with
 a hive property that enables one or the other behaviors.

 On Fri, May 24, 2013 at 8:07 AM, John Omernik j...@omernik.com wrote:

 I have mentioned this before, and I think this a big miss by the Hive
 team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
 case sensitive. Thus when you have new users moving over to Hive, if they
 see a command like like they will assume similarity (like many other SQL
 like qualities) and thus false negatives may ensue.  Even though it's
 different by default (I am ok with this ... I guess, my personal 
 preference
 is that it matches the defaults on other systems, and outside of that
 (which I am, in in the end fine with, just grumbly :) ) give us the 
 ability
 to set that behavior in the hive-site.xml.  That way when an org realizes
 that it is different, and their users are all getting false negatives, 
 they
 can just update the hive-site and fix the problem rather than have to
 include it in training that may or may not work.  I've added this comment
 to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for 
 fun. :)

 Please? :)




 On Fri, May 24, 2013 at 7:53 AM, Dean Wampler 
 deanwamp...@gmail.comwrote:

 Your where clause looks at the abbreviation, requiring 'A', not the
 state name. You got the correct answer.


 On Fri, May 24, 2013 at 6:21 AM, Sai Sai saigr...@yahoo.in wrote:

 But it should get more results for this:

 %a%

 than for

 %A%

 Please let me know if i am missing something.
 Thanks
 Sai


--
  *From:* Jov am...@amutu.com
 *To:* user@hive.apache.org; Sai Sai saigr...@yahoo.in
 *Sent:* Friday, 24 May 2013 4:39 PM
 *Subject:* Re: Difference between like %A% and %a%


 2013/5/24 Sai Sai saigr...@yahoo.in

 abbreviation l


 unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal
 with '%a%'.


 --
 Jov
 blog: http:amutu.com/blog http://amutu.com/blog





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com





Apache Flume Properties File

2013-05-24 Thread Raj Hadoop
Hi,
 
I just installed Apache Flume 1.3.1 and trying to run a small example to test. 
Can any one suggest me how can I do this? I am going through the documentation 
right now.
 
Thanks,
Raj