from:"Navis류승우"

Re: Warning when running hive

2015-01-27 Thread Navis류승우

Those configurations are all deprecated and you can remove warning message
by setting hive.conf.validation=false.

Don't know why Ambari needs that configurations.

Thanks,
Navis



2015-01-28 12:40 GMT+09:00 Devopam Mittra :

> +1
>
> I also need help on this one in particular
>
> regards
> Devopam
>
> On Tue, Jan 27, 2015 at 7:44 PM, Philippe Kernévez 
> wrote:
>
>> Hi,
>>
>> I had several warning like " WARN conf.HiveConf: HiveConf of name
>> hive.optimize.mapjoin.mapreduce does not exist" when I run hive.
>>
>> Extract from /etc/hive/conf/hive-site.xml :
>> "
>>   hive.optimize.mapjoin.mapreduce
>>   true
>> ""
>>
>> I don't find that properties in hive configuration but Amabari does't
>> allow to remove those parameters and have a documentation for them. Should
>> I remove them ?
>>
>> The full list is :
>> WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does
>> not exist
>> WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist
>> WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation
>> does not exist
>> WARN conf.HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl
>> does not exist
>> WARN conf.HiveConf: HiveConf of name
>> hive.auto.convert.sortmerge.join.noconditionaltask does not exist
>>
>> Regards,
>> Philippe Kernévez
>>
>>
>
>
> --
> Devopam Mittra
> Life and Relations are not binary
>

Re:

2015-01-18 Thread Navis류승우

There seemed not registered issue in hive. Really appreciate if you do that.

Thanks,
Navis

2015-01-19 10:56 GMT+09:00 Dayong :

> Is it reported? If not, I'll report it?
>
> Thanks,
> Dayong
>
> On Jan 18, 2015, at 8:41 PM, Navis류승우  wrote:
>
> Yes, it's a bug. Seemed not handling properly with PRECEDING+PRECEDING
> or FOLLOWING+FOLLOWING cases.
>
> Thanks,
> Navis
>
> 2015-01-18 4:40 GMT+09:00 DU DU :
>
>> Hi folks,
>> The window clause in Hive 0.13.* does not work for the following example
>> statement
>>
>>- BETWEEN 2 PRECEDING AND 1 PRECEDING
>>- BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
>>
>> Is there a reported JIRA for this? If not, I'll create Jira for this.
>> Thanks,
>> Will
>>
>>  jdbc:hive2://> SELECT name, dept_num, salary,
>>
>> . . . . . . .> MAX(salary) OVER (PARTITION BY dept_num ORDER BY
>>
>> . . . . . . .> name ROWS
>>
>> . . . . . . .> BETWEEN 2 PRECEDING AND 1 PRECEDING) win4_alter
>>
>> . . . . . . .> FROM employee_contract
>>
>> . . . . . . .> ORDER BY dept_num, name;
>>
>> Error: Error while compiling statement: FAILED: SemanticException Failed
>> to breakup Windowing invocations into Groups. At least 1 group must only
>> depend on input columns. Also check for circular dependencies.
>>
>> Underlying error: Window range invalid, start boundary is greater than
>> end boundary: window(start=range(2 PRECEDING), end=range(1 PRECEDING))
>> (state=42000,code=4)
>>
>>
>>
>> jdbc:hive2://> SELECT name, dept_num, salary,
>>
>> . . . . . . .> MAX(salary) OVER (PARTITION BY dept_num ORDER BY
>>
>> . . . . . . .> name ROWS
>>
>> . . . . . . .> BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) win1
>>
>> . . . . . . .> FROM employee_contract
>>
>> . . . . . . .> ORDER BY dept_num, name;
>>
>> Error: Error while compiling statement: FAILED: SemanticException End of
>> a WindowFrame cannot be UNBOUNDED PRECEDING (state=42000,code=4)
>>
>
>

Re:

2015-01-18 Thread Navis류승우

Yes, it's a bug. Seemed not handling properly with PRECEDING+PRECEDING
or FOLLOWING+FOLLOWING cases.

Thanks,
Navis

2015-01-18 4:40 GMT+09:00 DU DU :

> Hi folks,
> The window clause in Hive 0.13.* does not work for the following example
> statement
>
>- BETWEEN 2 PRECEDING AND 1 PRECEDING
>- BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
>
> Is there a reported JIRA for this? If not, I'll create Jira for this.
> Thanks,
> Will
>
>  jdbc:hive2://> SELECT name, dept_num, salary,
>
> . . . . . . .> MAX(salary) OVER (PARTITION BY dept_num ORDER BY
>
> . . . . . . .> name ROWS
>
> . . . . . . .> BETWEEN 2 PRECEDING AND 1 PRECEDING) win4_alter
>
> . . . . . . .> FROM employee_contract
>
> . . . . . . .> ORDER BY dept_num, name;
>
> Error: Error while compiling statement: FAILED: SemanticException Failed
> to breakup Windowing invocations into Groups. At least 1 group must only
> depend on input columns. Also check for circular dependencies.
>
> Underlying error: Window range invalid, start boundary is greater than end
> boundary: window(start=range(2 PRECEDING), end=range(1 PRECEDING))
> (state=42000,code=4)
>
>
>
> jdbc:hive2://> SELECT name, dept_num, salary,
>
> . . . . . . .> MAX(salary) OVER (PARTITION BY dept_num ORDER BY
>
> . . . . . . .> name ROWS
>
> . . . . . . .> BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) win1
>
> . . . . . . .> FROM employee_contract
>
> . . . . . . .> ORDER BY dept_num, name;
>
> Error: Error while compiling statement: FAILED: SemanticException End of a
> WindowFrame cannot be UNBOUNDED PRECEDING (state=42000,code=4)
>

Re: Query Logs

2014-12-29 Thread Navis류승우

When using hive shell, it's shown in console something like,

hive (default)> select * from src order by key limit 10;
Query ID = navis_20141230100808_09c0a077-442e-4943-a136-710cba6e94d1
Total jobs = 1
Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id
application_1419899870643_0001)


You mean in the case of JDBC client?


2014-12-30 5:14 GMT+09:00 P lva :

> Hello everyone,
>
> Is there any way to figure out query associated with the application id
> when using tez as the execution engine ?
>
> Thanks
>
>

Re: Number of mappers is always 1 for external Parquet tables.

2014-12-28 Thread Navis류승우

Try with "set
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat"

Thanks,
Navis

2014-12-24 18:27 GMT+09:00 村下瑛 :

> Hi, all
>
> I am trying to load pig output from Hive as an external table,
> and currently stuck with that Hive always set the number of mappers to 1,
> though it has more than 10 million records and is composed of multiple
> files.
> Could any of guys have any idea?
>
> To be more specific, the output is in Parquet format generated by Pig
> Script
> without any compression.
>
> STORE rows INTO '/table-data/test' USING parquet.pig.ParquetStorer;
>
> The directory does contain 16 part-m-00xx.parquet files and _metadata.
> And the external table is pointed to the directory.
>
> Here are the create table statement I've used.
>
> CREATE EXTERNAL TABLE `t_main_wop`(
>   `id` string,
>   `f1` string,
>   ...
>  )
> STORED AS PARQUET
> LOCATION
>   '/table-data/test';
>
> It seem to properly read the parquet file itself since
> SELECT * FROM test;
> returns the proper result.
>
> However, everytime I give it queries that requires mapreduce jobs,
> It only uses single mapper, and takes like forever.
>
> hive> select count(*) from t_main_wop;
> Query ID = xxx
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Starting Job = job_yyy, Tracking URL = zzz
> Kill Command = hadoop_job  -kill job_yyy
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 1
> 2014-12-24 02:49:46,912 Stage-1 map = 0%,  reduce = 0%
> 2014-12-24 02:50:45,847 Stage-1 map = 0%,  reduce = 0%
>
>
> Why is it?
> I've set mapred.map.tasks=100, but to no avail.
> Again the directory contans 16 part files, so I think it sould be able to
> use at least 16 mappers.
>
> I would really appreciate if you could give me any suggestions
> Thanks,
>
> Akira
>
>
>
>
>

Re: Row Delimiter in Create table

2014-12-17 Thread Navis류승우

Afaik, it was restricted by implementation of hadoop. But now hadoop-2
supports custom delimiter, hopefully it also can be implemented in hive.

I'm not sure but currently possible way of do that is setting
"textinputformat.record.delimiter" in table properties.

Thanks,
Navis

2014-12-18 6:20 GMT+09:00 Gayathri Swaroop :
>
> Hi,
>
> I am trying to create table for a text file who has a row delimiter other
> than new line character. I know hive create table does not support anything
> other than new line. I have columns where in data contains new line so i
> specified a new line character in my sqoop. What are the best options as
> far as i googled or looked at the manual a transformation is required but
> my table is real huge.
>
> Thanks,
> G
>

Re: [ANNOUNCE] New Hive PMC Member - Prasad Mujumdar

2014-12-09 Thread Navis류승우

Congratulations!

2014-12-10 8:35 GMT+09:00 Jason Dere :

> Congrats!
>
> On Dec 9, 2014, at 3:02 PM, Venkat V  wrote:
>
> > Congrats Prasad!
> >
> > On Tue, Dec 9, 2014 at 2:32 PM, Brock Noland  wrote:
> > Congratulations Prasad!!
> >
> > On Tue, Dec 9, 2014 at 2:17 PM, Carl Steinbach  wrote:
> > I am pleased to announce that Prasad Mujumdar has been elected to the
> Hive Project Management Committee. Please join me in congratulating Prasad!
> >
> > Thanks.
> >
> > - Carl
> >
> >
> >
> >
> > --
> > Venkat V
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: UDF related: org.apache.hive.com.esotericsoftware.kryo.KryoException

2014-10-21 Thread Navis류승우

The states in UDF(which is represented as "states SomeClass3") are needed?
If it's not, you can try them mark as a transient field.

Thanks,
Navis

2014-10-20 23:59 GMT+09:00 Harel Gliksman :

> Hello,
>
> I am experiencing inconsistent behavior when trying to use UDF on 0.13.1
> on Amazon's EMR (AMI 3.2.1).
>
> I generated a uber jar and deployed a UDF like so:
>
> create temporary function someFunction as "hive.udf.localization.MyUDF"
> using jar "s3://waze.mapreduce.shared/scripts/Hive/MyHive.jar";
>
> I am having these 2 (related?) problems:
>
> 1) When I simply try to use my UDF I get
>
> Error: java.lang.RuntimeException:
> org.apache.hive.com.esotericsoftware.kryo.KryoException:
> java.lang.NullPointerException
> Serialization trace:
> childRectangles (SomeClass1)
> statesTree (SomeClass2)
> states (SomeClass3)
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:360)
> at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:271)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:438)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:431)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:410)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException:
> java.lang.NullPointerException...
>
> After digging I found this:
>
>
> http://mail-archives.apache.org/mod_mbox/hive-dev/201408.mbox/%3CJIRA.12733732.1407927053435.81293.1408008733570@arcas%3E
>
> *and after setting *
>
> *hive.plan.serialization.format=javaXML*
>
> *the UDF is running OK on my test data set of 50 lines.*
>
> 2) When running the UDF in a more complex 2-joins query I am getting a
> somewhat related error:
>
> org.apache.hive.com.esotericsoftware.kryo.KryoException:
> java.lang.NullPointerException
> Serialization trace:
> org.apache.hadoop.hive.ql.parse.SemanticException: Generate Map Join Task
> Error: java.lang.NullPointerException
> Serialization trace:
> childRectangles (SomeClass1)
> statesTree (SomeClass2)
> states (SomeClass3)
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.JoinOperator)
> opParseCtxMap (org.apache.hadoop.hive.ql.plan.MapWork)
> mapWork (org.apache.hadoop.hive.ql.plan.MapredWork)...
>
> *This does not go away even after
> setting hive.plan.serialization.format=javaXML *
>
> Can someone please advise?
> Many thanks,
> Harel.
>

Re: ALTER TABLE T1 PARTITION(P1) CONCATENATE bug?

2014-10-14 Thread Navis류승우

Could you tell the version number of hive?

Thanks,
Navis

2014-10-15 2:00 GMT+09:00 Time Less :

> I have found a work-around for this bug. After you issue the ALTER
> TABLE...CONCATENATE command, issue:
>
> ALTER TABLE T1 PARTITION (P1) SET LOCATION
> ".../apps/hive/warehouse/DB1/T1/P1";
>
> This will fix the metadata that CONCATENATE breaks.
>
>
> ––
> *Tim Ellis:* 510-761-6610
>
>
> On Mon, Oct 13, 2014 at 10:37 PM, Time Less 
> wrote:
>
>> Has anyone seen anything like this? Google searches turned up nothing, so
>> I thought I'd ask here, then file a JIRA if no-one thinks I'm doing it
>> wrong.
>>
>> If I ALTER a particular table with three partitions once, it works.
>> Second time it works, too, but reports it is moving a directory to the
>> Trash that doesn't exist (still, this doesn't kill it). The third time I
>> ALTER the table, it crashes, because the directory structure has been
>> modified to something invalid.
>>
>> Here's a nearly-full output of the 2nd and 3rd runs. The ALTER is exactly
>> the same both times (I just press UP ARROW):
>>
>>
>> *HQL, 2nd Run:*hive (analytics)> alter table bidtmp partition
>> (log_type='bidder',dt='2014-05-01',hour=11) concatenate ;
>>
>>
>> *Output:*Starting Job = job_1412894367814_0017, Tracking URL =
>> application_1412894367814_0017/
>> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill
>> job_1412894367814_0017
>> Hadoop job information for null: number of mappers: 97; number of
>> reducers: 0
>> 2014-10-13 20:28:23,143 null map = 0%,  reduce = 0%
>> 2014-10-13 20:28:36,042 null map = 1%,  reduce = 0%, Cumulative CPU 49.69
>> sec
>> ...
>> 2014-10-13 20:31:56,415 null map = 99%,  reduce = 0%, Cumulative CPU
>> 812.65 sec
>> 2014-10-13 20:31:57,458 null map = 100%,  reduce = 0%, Cumulative CPU
>> 813.88 sec
>> MapReduce Total cumulative CPU time: 13 minutes 33 seconds 880 msec
>> Ended Job = job_1412894367814_0017
>> Loading data to table analytics.bidtmp partition (log_type=bidder,
>> dt=2014-05-01, hour=11)
>> rmr: DEPRECATED: Please use 'rm -r' instead.
>> Moved: '.../apps/hive/warehouse/analytics.db/bidtmp/
>> *dt=2014-05-01/hour=11/log_type=bidder*' to trash at:
>> .../user/hdfs/.Trash/Current
>> *// (note the bold-faced path doesn't exist, the partition is specified
>> as log_type first, then dt, then hour)*
>> Partition analytics.bidtmp*{log_type=bidder, dt=2014-05-01, hour=11}*
>> stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
>> *(here, the partition ordering is correct!)*
>> MapReduce Jobs Launched:
>> Job 0: Map: 97   Cumulative CPU: 813.88 sec   HDFS Read: 30298871932 HDFS
>> Write: 28746848923 SUCCESS
>> Total MapReduce CPU Time Spent: 13 minutes 33 seconds 880 msec
>> OK
>> Time taken: 224.128 seconds
>>
>>
>> *HQL, 3rd Run:*hive (analytics)> alter table bidtmp partition
>> (log_type='bidder',dt='2014-05-01',hour=11) concatenate ;
>>
>>
>> *Output:*java.io.FileNotFoundException: File does not exist:
>> .../apps/hive/warehouse/analytics.db/bidtmp/dt=2014-05-01/hour=11/log_type=bidder
>> *(because it should be log_type=.../dt=.../hour=... - not this order)*
>> at org.apache.hadoop.hdfs.
>> DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
>> at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
>> at
>> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:419)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>> at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>> at
>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>> at
>> org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask.execu

Re: HiveException: Stateful expressions cannot be used inside of CASE

2014-10-07 Thread Navis류승우

Stateful function should be called for all input rows but in if/when
clause, it cannot be guaranteed.

Any reason to declare "protect_column" function to be stateful?

Thanks,
Navis

2014-09-25 3:42 GMT+09:00 Dan Fan :

>  Hi Hive Users:
>
>  I have a hive generic hive udf which is called protect_column.
> The udf works fine when I call it along.
> But when I run the following query:
>
>
>   select case when id = 5 then protect_column(id, 'age', 12L) else id end
> from one_row_table ;
>
>
>  It says
>
>
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Stateful
> expressions cannot be used inside of CASE
>
>
>  I was reading the source code. And I think it is related to GenericCase
> and GenericWhen according to
>
>
> https://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java?p=1197837
>
>  Could anyone help explain explicitly what exactly is genericCase
> GenericWhen and why we cannot put the udf inside a case when ?
>
>  Thanks for your time helping me out
>
>  Best
>
>  Dan
>

Re: Hive splits/adds rows when outputting dataset with new lines

2014-10-06 Thread Navis류승우

Try with set hive.default.fileformat=SequenceFile;

Thanks,
Navis

2014-10-06 20:51 GMT+09:00 Maciek :

> Hello,
>
> I've encountered a situation when printing new lines corrupts (multiplies)
> the returned dataset.
> This seem to be similar to HIVE-3012
>  (fixed on 0.11), but as
> I'm on Hive 0.13 it's still the case.
> Here are the steps to illustrate/reproduce:
>
> 1. Fist let'e create table with one row and one column by selecting from
> any existing table (substitute ANYTABLE respecitvely):
>
> CREATE TABLE singlerow AS SELECT 'worldofhostels' wordsmerged FROM
> ANYTABLE LIMIT 1;
>
> and verify:
>
> SELECT * FROM singlerow;
>
> OK---
> worldofhostels
>
> Time taken: 0.028 seconds, Fetched: 1 row(s)
>
> All good so far.
> 2. Now let's introduce newline here by:
>
> SELECT regexp_replace(wordsmerged,'of',"\nof\n") wordsseparate FROM
> singlerow;
>
> OK--
>
> world
> of
> hostels
>
> Time taken: 6.404 seconds, Fetched: 3 row(s)
> and I'm suddenly getting 3 rows now.
> 3. This is not just for CLI output as when submitting CTAS, it
> materializes such corrupted result set:
>
> CREATE TABLE corrupted AS
> SELECT regexp_replace(wordsmerged,'of',"\nof\n") wordsseparate,
> wordsmerged FROM singlerow;
>
> hive> select * from corrupted;
>
> OK
>
> world NULL
> of NULL
> hostels worldofhostels
>
> Time taken: 0.029 seconds, Fetched: 3 row(s)
> Apparently, the same happens - new table is split into multiple rows with
> columns following the one in question (like wordsmerged) become NULLs
> Am i doing something wrong here?
>
> Regards,
> Maciek
>

Re: hive query with in statement

2014-08-12 Thread Navis류승우

Could you try "cast(calldate as string)"?

Thanks,
Navis


2014-08-12 20:22 GMT+09:00 ilhami Kalkan :

> Hi all,
> I have a problem with IN statement in HiveQL. My table "cdr", column
> "calldate" which type is "date". First query is successfully return:
> select * from cdr where calldate = '2014-05-02';
>
> But when query with IN statement,
>
> select * from cdr where calldate in ( '2014-08-11','2014-05-02');
>
> it returns below exception:
>
> Error: Error while processing statement: FAILED: SemanticException [Error
> 10014]: Line 1:38 Wrong arguments ''20014-03-02'': The arguments for IN
> should be the same type! Types are: {date IN (string, string)}
> (state=42000,code=10014)
>
> How can I handle this?
> Thanks.
>
> Hive version 0.12
>
>
>
>

Re: stop hive from generating job file for every query

2014-07-30 Thread Navis류승우

Set value of "hive.querylog.location" to empty string in hive-site.xml.

Thanks,
Navis


2014-07-31 13:08 GMT+09:00 Gitansh Chadha :

> Hi,
>
> I want to stop hive commands from generating the hive job file (under
> /tmp/user/hive_log_job*) for every query, as we run multiple queries in
> batch and the file is getting really big. (1GB+)
>
> what would be the best way to do it?
>
> Thanks in advance,
> g
>

Re: hive auto join conversion

2014-07-30 Thread Navis류승우

Could you do it with hive.ignore.mapjoin.hint=false? Mapjoin hint is
ignored from hive-0.11.0 by default (see
https://issues.apache.org/jira/browse/HIVE-4042)

Thanks,
Navis


2014-07-31 10:04 GMT+09:00 Chen Song :

> I am using cdh5 with hive 0.12. We have some hive jobs migrated from hive
> 0.10 and they are written like below:
>
> select /*+ MAPJOIN(sup) */ c1, c2, sup.c
> from
> (
> select key, c1, c2 from table1
> union all
> select key, c1, c2 from table2
> ) table
> left outer join
> sup
> on (table.c1 = sup.key)
> distribute by c1
>
> In Hive 0.10 (CDH4), Hive translates the left outer join into a map join
> (map only job), followed by a regular MR job for distribute by.
>
> In Hive 0.12 (CDH5), Hive is not able to convert the join into a map join.
> Instead it launches a common map reduce for the join, followed by another
> mr for distribute by. However, when I take out the union all operator, Hive
> seems to be able to create a single MR job, with map join on map phase, and
> reduce for distribute by.
>
> I read a bit on
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
> and found out that there are some restrictions on map side join starting
> Hive 0.11. The following are not supported.
>
>
>- Union Followed by a MapJoin
>- Lateral View Followed by a MapJoin
>- Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By)
>Followed by MapJoin
>- MapJoin Followed by Union
>- MapJoin Followed by Join
>- MapJoin Followed by MapJoin
>
>
> So if one side of the table (big side) is a union of some tables and the
> other side is a small table, Hive would not be able to do a map join at
> all? Is that correct?
>
> If correct, what should I do to make the job backward compatible?
>
> --
> Chen Song
>
>

Re: A question about SessionManager

2014-07-24 Thread Navis류승우

https://issues.apache.org/jira/browse/HIVE-5799 is for that kind of cases,
but not included in releases yet.

Thanks,
Navis


2014-07-24 20:04 GMT+09:00 Zhanghe (D) :

> Hey Guys,
>
>I'm working with HiveServer2. I know the HiveServer holds a session for
> each client, and close it when the client execute 'CloseSession'.
>But if the client is forced to terminate, like Ctrl+Z or kill -9, the
> session in HiveServer will not be closed.
>Does there exists a problem? How can we solve it?
>
> Thanks.

Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-23 Thread Navis류승우

Looks like it's caused by HIVE-7314. Could you try that with
"hive.cache.expr.evaluation=false"?

Thanks,
Navis


2014-07-24 14:34 GMT+09:00 丁桂涛（桂花） :

> Yes. The output is correct: ["tp","p","sp"].
>
> I developed the UDF using JAVA in eclipse and exported the jar file into
> the auxlib directory of hive. Then add the following line into the
> ~/.hiverc file.
>
> create temporary function getad as 'xxx';
>
> The hive version is 0.12.0. Perhaps the problem resulted from the
> mis-optimization of hive.
>
>
> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin  wrote:
>
>> Have you tried this query without UDF, say:
>>
>> select
>>   array(tp, p, sp) as ps
>> from
>>   (
>>   select
>> 'tp' as tp,
>> 'p' as p,
>> 'sp' as sp
>>   from
>> table_name
>>   where
>> id = 
>>   ) t;
>>
>>
>> And how you implement the UDF?
>>
>>
>> 谢谢
>> 金杰 (Jie Jin)
>>
>>
>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛（桂花）  wrote:
>>
>>>  Recently I developed a Hive Generic UDF *getad*. It accepts a map type
>>> and a string type parameter and outputs a string value. But I found the UDF
>>> output really confusing in different conditions.
>>>
>>> Condition A:
>>>
>>> select
>>>   getad(map_col, 'tp') as tp,
>>>   getad(map_col, 'p') as p,
>>>   getad(map_col, 'sp') as sp
>>> from
>>>   table_name
>>> where
>>>   id = ;
>>>
>>> The output is right: 'tp', 'p', 'sp'.
>>>
>>> Condition B:
>>>
>>> select
>>>   array(tp, p, sp) as ps
>>> from
>>>   (
>>>   select
>>> getad(map_col, 'tp') as tp,
>>> getad(map_col, 'p') as p,
>>> getad(map_col, 'sp') as sp
>>>   from
>>> table_name
>>>   where
>>> id = 
>>>   ) t;
>>>
>>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>>> the same result:
>>>
>>> select
>>>   array(
>>> getad(map_col, 'tp'),
>>> getad(map_col, 'p'),
>>> getad(map_col, 'sp')
>>>   ) as ps
>>> from
>>>   table_name
>>> where
>>>   id = ;
>>>
>>> Could you please provide me some hints on this? Thanks!
>>>
>>> --
>>> 丁桂涛
>>>
>>
>>
>
>
> --
> 丁桂涛
>

Re: Help on restricting users

2014-07-21 Thread Navis류승우

If there is proper authentication mechanism, you can access the user
information and query plan in HookContext. And with some tricky way, it's
possible to find if the query is select * or not.

But if it's not the case(no authentication) and cannot control login name
of hive JDBC, hook might not that helpful.

Thanks,
Navis


2014-07-22 4:12 GMT+09:00 sai chaitanya tirumerla :

> Hi Navis,
>
> Thank you so much for the information.I'm newbie to hooks in hive , could
> you please let me know how we can implement hooks for restricting users and
> do we have any reference/examples to look at.
>
> Thanks,
>
> Sai
>
>
> On Sun, Jul 20, 2014 at 9:23 PM, Navis류승우  wrote:
>
>> You can implement that in Hook and register in hive-site.xml.
>>
>> Thanks,
>> Navis
>>
>>
>> 2014-07-19 17:32 GMT+09:00 sai chaitanya tirumerla :
>>
>> Hi,
>>>
>>> I would like to restrict users doing
>>> "select * from table;" when accessed from any jdbc/odbc tools like sql
>>> workbench/excel etc.. connecting to hiveserver2 on port 1. I am able to
>>> successfully restrict users from running mapreduce jobs like "select
>>> count(*) from table" by changing permissions on tmp directory on hdfs
>>> allowing only certain users to access.
>>> I have tried using SQL based authorization but it is introduced in hive
>>> 0.13 and the version i am currently on is hive 0.11.
>>> I have also tried using hive client authorization which works only for
>>> hive cli but not hiveserver2 when connected from jdbc/odbc tools as the
>>> connection is done via the default user.
>>>
>>> So is there any way that we can restrict users accessing the data (
>>> select * from table )?
>>>
>>> Thanks in Advance!!
>>>
>>> ---sai---
>>>
>>
>>
>

Re: Help on restricting users

2014-07-20 Thread Navis류승우

You can implement that in Hook and register in hive-site.xml.

Thanks,
Navis


2014-07-19 17:32 GMT+09:00 sai chaitanya tirumerla :

> Hi,
>
> I would like to restrict users doing
> "select * from table;" when accessed from any jdbc/odbc tools like sql
> workbench/excel etc.. connecting to hiveserver2 on port 1. I am able to
> successfully restrict users from running mapreduce jobs like "select
> count(*) from table" by changing permissions on tmp directory on hdfs
> allowing only certain users to access.
> I have tried using SQL based authorization but it is introduced in hive
> 0.13 and the version i am currently on is hive 0.11.
> I have also tried using hive client authorization which works only for
> hive cli but not hiveserver2 when connected from jdbc/odbc tools as the
> connection is done via the default user.
>
> So is there any way that we can restrict users accessing the data ( select
> * from table )?
>
> Thanks in Advance!!
>
> ---sai---
>

Re: exchange partition documentation

2014-07-20 Thread Navis류승우

HIVE-4095, originally intended to implement,

alter table  exchange partition () with
table ;

But in implementation,  Dheeraj Kumar Singh, the original implementor, seemed
confused and implemented this in inverted manner (target to source).

HIVE-6129 fixed this(in 0.13.0) and now it's consistent to document. But
the example has been not working in any version, because HIVE-4095
coerced target_table
and source_table should have same partition columns.

https://issues.apache.org/jira/browse/HIVE-6133 is the needed one for the
example to work. But seemed to be failed to acquire any interest.

Thanks,
Navis


2014-07-21 11:40 GMT+09:00 Lefty Leverenz :

> I'd be happy to update the docs, but need some guidance.  The sytax
> confused me originally -- see comments on HIVE-4095
> .
>  I'll add this discussion to those comments.
>
>
> -- Lefty
>
>
> On Sun, Jul 20, 2014 at 10:22 PM, Andre Araujo  wrote:
>
>> Indeed! The documentation is a fair bit off.
>>
>> I've tested the below on Hive 0.12 on CDH and it works fine.
>> Lefty, would you please update the documentation on the two pages below?
>>
>> ---
>> Source:
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExchangePartition
>> "Exchange Partition" section
>>
>> {code}
>> ALTER TABLE source_table_name EXCHANGE PARTITION (partition_spec) WITH
>> TABLE target_table_name;
>> {code}
>>
>> This statement lets you move the data in a partition from a table to
>> another table that has the same schema and partition keys, but does not
>> already have that partition.
>> For details, see Exchange Partition and HIVE-4095.
>>
>> ---
>> Source:
>> https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition
>>  The EXCHANGE PARTITION DDL command has been proposed as part of
>> https://issues.apache.org/jira/browse/HIVE-4095.
>>
>> The syntax is:
>>
>> {code}
>> alter table  exchange partition ()
>> with table ;
>> {code}
>>
>> The partition spec can be fully or partially specified.
>>
>> The semantics of the above statement is that the data is moved from the
>> source table to the target table. Both the tables must have the same schema
>> and the same partition keys. The operation fails in the presence of an
>> index.
>> The partition(s) must exist in the source table and mus NOT exists in the
>> target one. Consider the following examples:
>>
>> ## Full partition spec
>>
>> {code}
>> create table T1(a string, b string) partitioned by (ds string);
>> create table T2(a string, b string) partitioned by (ds string);
>> alter table T1 add partition (ds = '1');
>> {code}
>>
>> The operation:
>>
>> {code}
>>  alter table T1 exchange partition (ds='1') with table T2;
>> {code}
>>
>> moves the data from T1 to T2@ds=1. The operation fails if T2@ds=1
>> already exists or T1 and T2 have different schemas and/or partition keys.
>>
>> ## Partial partition spec
>>
>> {code}
>> create table T1(a string, b string) partitioned by (ds string, hr string);
>> create table T2(a string, b string) partitioned by (ds string, hr string);
>> alter table T1 add partition (ds = '1', hr = '00');
>> alter table T1 add partition (ds = '1', hr = '01');
>> alter table T1 add partition (ds = '1', hr = '03');
>> {code}
>>
>> The operation:
>>
>> {code}
>> alter table T1 exchange partition (ds='1') with table T2;
>> {code}
>>
>> moves the 3 partitions from T1 to T2. The operation fails if any of the
>> partitions already exist on T2 or if T1 and T2 have different schemas
>> and/or partition keys.
>> Either all the partitions of T1 will get created or the whole operation
>> will fail. All partitions of T1 are dropped.
>>
>>
>>
>> On 21 July 2014 05:52, Kristof Vanbecelaere <
>> kristof.vanbecela...@gmail.com> wrote:
>>
>>> I think the documentation related to exchanging partitions is not
>>> accurate
>>>
>>> https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition
>>>
>>> when I try it out on hortonworks sandbox 2.1 which runs Hive 0.13 I get
>>> this:
>>>
>>> hive> create table T1(a string, b string) partitioned by (ds string);
>>>
>>> OK
>>>
>>> Time taken: 0.72 seconds
>>>
>>> hive> create table T2(a string, b string);
>>>
>>> OK
>>>
>>> Time taken: 0.357 seconds
>>>
>>> hive> alter table T1 exchange partition (ds='1') with table T2;
>>>
>>> FAILED: SemanticException [Error 10235]: Tables have incompatible
>>> schemas and their partitions  cannot be exchanged.
>>>
>>>
>>
>>
>> --
>> André Araújo
>> Big Data Consultant/Solutions Architect
>> The Pythian Group - Australia - www.pythian.com
>>
>> Office (calls from within Australia): 1300 366 021 x1270
>> Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696
>> x1270
>> Mobile: +61 410 323 559
>> Fax: +61 2 9805 0544
>> IM: pythianaraujo @ AIM/MSN/Y! or ara...@pythian.com @ GTalk
>>
>> “Suc

Re: random NPE in HiveInputFormat.init() ??

2014-07-20 Thread Navis류승우

I thinks it's fixed in https://issues.apache.org/jira/browse/HIVE-7011,
which will be included in hive-0.14.0. Sadly, there seemed no simple
walkaround for this.

Thanks,
Navis


2014-07-19 15:03 GMT+09:00 Yang :

>
> we are getting a random (happening about 20% of the time, if we repeatedly
> run the same query) error with hive 0.13.0.2
>
>
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:300)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
> at
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
> Job Submission failed with exception 'java.lang.NullPointerException(null)'
>
>
>
>
>
>
> from google search
> https://www.google.com/search?q=hive+null+pointer+exception+.HiveInputFormat.init(HiveInputFormat.java%3A255)&oq=hive+null+pointer+exception++.HiveInputFormat.init(HiveInputFormat.java%3A255)&aqs=chrome..69i57.16859j0j7&sourceid=chrome&es_sm=93&ie=UTF-8
>
> the top 2 results are pretty relevant:
>
> https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CB8QFjAA&url=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fhive-user%2F201302.mbox%2F%253CCAKm%3DR7XyX52-AW5fm5N1vdtW30WQfsw-XAw5%3DmLqn55XeRNSJQ%40mail.gmail.com%253E&ei=FwjKU_rDAdPfoATbsIKgBA&usg=AFQjCNEfg7cyVbTxEl-HeZayFdWfb7Qn_w&sig2=M_0aiHBqrifHIChqiFWXrg&bvm=bv.71198958,d.cGU
>
> this one suggests that it's due to resourcemanager being set to something
> as localhost
>
>
> https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0CCkQFjAB&url=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fhive-user%2F201308.mbox%2F%253CCAPt%2B2w-0mvzNEk8-9pdpsOag04uG%3D53WF_aRZXG68LsA3MfWGA%40mail.gmail.com%253E&ei=FwjKU_rDAdPfoATbsIKgBA&usg=AFQjCNG4uoRrzD8z1mHy35ztCmt7KRDVOw&sig2=LlreqybSuOurYtbJqRISUQ&bvm=bv.71198958,d.cGU
>
> second one suggests that you have to set the resource manager instead of
> leaving it empty.
>
>
> since we do have a valid server value set to the
> yarn.resourcemanager.address ,  and the above error only appears about 20%
> time, does it mean that our resourcemanager is unstable?
>
> thanks
> Yang
>

Re: Custom HBaseKeyFactory and ColumnMapping

2014-07-16 Thread Navis류승우

My bad. Could you do that?

Thanks,
Navis


2014-07-17 9:15 GMT+09:00 Andrew Mains :

>  Hi all,
>
> I'm currently experimenting with using the new HBaseKeyFactory interface
> (implemented in https://issues.apache.org/jira/browse/HIVE-6411) to do
> some custom serialization and predicate pushdown on our HBase schema.
> Ideally, I'd like to be able to use the information from the
> hbase.columns.mapping property on the table, and indeed,
> AbstractHBaseKeyFactory seems to support this use case, exposing a
> protected ColumnMappings.ColumnMapping keyMapping member. However,
> ColumnMappings.ColumnMapping exposes no public members (everything is
> package private org.apache.hadoop.hive.hbase), meaning that I can't read
> any data from the ColumnMapping in my custom HBaseKeyFactory.
>
> Is this behavior intentional? Obviously I could work around this by
> declaring my factory in the same package, but it seems like the user
> experience would be better if there were public accessors for the fields in
> ColumnMappings.ColumnMapping. Is there another way to do this that I'm
> missing? If this isn't intentional, I'll raise a JIRA issue and submit a
> (small) patch.
>
> Thanks!
>
> Andrew
>
>
>
>

Re: Hive can I contribute to Hive confluence wiki documents?

2014-07-13 Thread Navis류승우

For "cache.expr.evaluation" problem, it's fixed in
https://issues.apache.org/jira/browse/HIVE-7314 (will be included in next
release)

But I agree that it can be a critical problem for users of 0.12.0 and
0.13.x version and need proper warning on that.

Thanks,
Navis


2014-07-13 16:48 GMT+09:00 郭士伟 :

> And I haven't add other information in the wikidoc yet.
>
>
> 2014-07-04 14:37 GMT+08:00 Lefty Leverenz :
>
>> The 'hive.cache.expr.evaluation' parameter is documented in the wikidoc 
>> Configuration
>> Properties
>> .
>>  Have you added other information in a different wikidoc?  If so, I'll link
>> the docs to each other.
>>
>> -- Lefty
>>
>>
>> On Thu, May 29, 2014 at 5:31 AM, Lefty Leverenz 
>> wrote:
>>
>>> Thank you for pointing this out.  The 'hive.cache.expr.evaluation'
>>> parameter was added by HIVE-4209
>>>  and it needs to be
>>> included in the wiki.  Its description is in hive-default.xml.template:
>>>
>>> If true, evaluation result of deterministic expression referenced twice
 or more will be cached. For example, in filter condition like ".. where key
 + 10 > 10 or key + 10 = 0" "key + 10" will be evaluated/cached once and
 reused for following expression ("key + 10 = 0"). Currently, this is
 applied only to expressions in select or filter operator.
>>>
>>>
>>> Do you want to provide more information than that?  To contribute to the
>>> wiki, create a Confluence account
>>>  and send a message
>>> to this list asking for wiki editing privilege and giving your account
>>> name.  But if you just want the description to be added to the wiki, I can
>>> take care of that.
>>>
>>> If you have additional information, it might not belong in Configuration
>>> Properties
>>>  
>>> (although
>>> once the parameter is there it can link to another page for details).  The
>>> Select
>>> 
>>> page is one possibility, or maybe you'll find a better place for it.
>>>
>>> If you want to revise the description in hive-default.xml.template, you
>>> need to open a JIRA and patch the file.
>>>
>>>
>>> -- Lefty
>>>
>>>
>>> On Sat, May 24, 2014 at 1:01 AM, 郭士伟  wrote:
>>>
 I have used a configuration property 'hive.cache.expr.evaluation' in
 hive 0.12 recently and found it useful. But I cannot find any documents
 about it on the wiki page:
 https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties.
 Indeed, I pick it up from the source code.

 I think it maybe useful for other hive users if I can write some
 documents about the property.

 So, can anyone give me some pointers that how can I make the change
 happen ?

 Thanks, anyway.

>>>
>>>
>>
>

Re: pass new job name to tez

2014-07-09 Thread Navis류승우

In GenTezProcContext, you can find "new
TezWork(conf.getVar(HiveConf.ConfVars.HIVEQUERYID))" part. And the final
name is name + ":" + (++counter);

Thanks,
Navis


2014-07-10 12:43 GMT+09:00 Grandl Robert :

> Hi guys,
>
> I am trying to identify a DAG in Tez with a different id, based on job
> name(for e.g. query55.sql from hive-testbench) + input size.
>
> So my new identifier should be for example query55_2048MB. It seems that a
> DAG in tez, already takes a name which comes from a jobPlan.getName()
> passed through google protobuf thing. Because the DAG is generated in Hive,
> I think the new identifier should come from Hive right ?
>
> Can you pinpoint me which classes, I should change in Hive in order to
> propagate the new identifier for the DAG ?
>
> Thanks,
> Robert
>
>

Re: Hive UDF performance issue

2014-07-09 Thread Navis류승우

Yes, 2M x 1M makes 2T pairing in single reducer.

Thanks,
Navis


2014-07-10 1:50 GMT+09:00 Malligarjunan S :

> Hello All,
> Is that the expected behavior from hive to take so much of time?
>
>
> Thanks and Regards,
> Sankar S
>
>
> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S 
> wrote:
>
>> Hello All,
>>
>> Can any one help me to answer to my question posted on Stackoverflow?
>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
>> It is pretty urgent. Please help me.
>>
>> Thanks and Regards,
>> Sankar S.
>>
>
>

Re: Possible memory leak with 0.13 and JDBC

2014-07-09 Thread Navis류승우

12  [Ljava.lang.Object;
>   19:   357  72512  [Ljava.util.HashMap$Entry;
>   20:  4037  64592  java.lang.Object
>   21:  1608  64320  java.util.TreeMap$Entry
>   22:  1566  62640
>  com.google.common.collect.MapMakerInternalMap$WeakEntry
>   23:   880  56320  java.net.URL
>   24:   167  51048  [I
>   25:   559  44720  java.lang.reflect.Method
>   26:86  40080  [Ljava.util.Hashtable$Entry;
>   27:   163  36512
>  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
>   28:   779  31160  java.util.LinkedHashMap$Entry
>   29:   370  29664  [Ljava.util.WeakHashMap$Entry;
>   30:   457  29248
>  org.apache.hadoop.hive.conf.HiveConf$ConfVars
>   31:   726  29040  java.lang.ref.Finalizer
>   32:   335  26800  java.util.jar.JarFile$JarFileEntry
>   33:  1566  25056
>  com.google.common.collect.MapMakerInternalMap$StrongValueReference
>   34:   351  22464  java.util.jar.JarFile
>   35:   370  20720  java.util.WeakHashMap
>   36:   438  17520  java.lang.ref.SoftReference
>   37:   360  17280  sun.nio.cs.UTF_8$Encoder
>   38:   358  17184  sun.misc.URLClassPath$JarLoader
>   39:   337  16176  java.util.zip.Inflater
>   40:   223  16056  java.lang.reflect.Constructor
>   41:   328  15744  java.util.HashMap
>   42:   537  12888  java.util.LinkedList$Node
>   43:   539  12520  [Ljava.lang.Class;
>   44:   384  12288  java.lang.ref.ReferenceQueue
>   45:   357  11424  java.util.zip.ZipCoder
>   46:   284   9088  java.util.LinkedList
>   47:   357   8568  java.util.ArrayDeque
>   48:   337   8088  java.util.zip.ZStreamRef
>   49:   306   7344  java.lang.Long
>   50:34   7176  [Z
>
>
>
> On 8 July 2014 at 08:40:20, Navis류승우 (navis@nexr.com
> ) wrote:
>
> Could you try "jmap -histo:live " and check hive objects which seemed
> too many?
>
> Thanks,
> Navis
>
>
> 2014-07-07 22:22 GMT+09:00 jonas.partner :
>
>>  Hi Benjamin,
>>  Unfortunately this was a really critical issue for us and I didn’t think
>> we would find a fix in time so we switched  to generating a hive scripts
>> programmatically then running that via an Oozie action which uses the Hive
>> CLI.  This seems to create a stable solution although is a lot less
>> convenient than JDBC for our use case.
>>
>>  I hope to find some more time to look at this later in the week since
>> JDBC would simplify the solution.  I would be very interested to hear if
>> you make any progress.
>>
>>  Regards
>>
>>  Jonas
>>
>> On 7 July 2014 at 14:14:46, Benjamin Bowman (bbowman...@gmail.com
>> ) wrote:
>>
>>  I believe I am having the same issue.  Hive 0.13 and Hadoop 2.4.  We
>> had to increase the Hive heap to 4 GB which allows Hive to function for
>> about 2-3 days.  After that point it has consumed the entire heap and
>> becomes unresponsive and/or throws OOM exceptions.  We are using  Beeline
>> and HiveServer 2 and connect via JDBC to the database tens of thousands of
>> times a day.
>>
>> I have been working with a developer at Hortonworks to find a solution
>> but we have not come up with anything yet.  Have you made any progress on
>> this issue?
>>
>> Thanks,
>> Benjamin
>>
>>
>> On Thu, Jul 3, 2014 at 4:17 PM, jonas.partner <
>> jonas.part...@opencredo.com> wrote:
>>
>>>  Hi Edward,
>>>
>>>  Thanks for the response.  Sorry I posted the wrong version. I also
>>> added close  on the two result sets to the code taken from the wiki as
>>> below but still the same problem.
>>>
>>>  Will try to run it through your kit at the weekend.  For the moment I
>>> switched to running the statements as a script through the hive client (not
>>> beeline) which seems stable even with hundreds of repetitions.
>>>
>>>  Regards
>>>
>>>  Jonas
>>>
>>>   public static void run() throws SQLException {
>>> try {
>>> Class.forName(driverName);
>>> } catch (ClassNotFoundException e) {
>>> // TODO Auto-generated catch block
>>> e.printStack

Re: Hive UDF performance issue

2014-07-09 Thread Navis류승우

It's cross producting. Not strange taking so much time even with small
tables.

Thanks,
Navis


2014-07-09 2:53 GMT+09:00 Malligarjunan S :

> Hello All,
>
> Can any one help me to answer to my question posted on Stackoverflow?
> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
> It is pretty urgent. Please help me.
>
> Thanks and Regards,
> Sankar S.
>

Re: Possible memory leak with 0.13 and JDBC

2014-07-08 Thread Navis류승우

Could you try "jmap -histo:live " and check hive objects which seemed
too many?

Thanks,
Navis


2014-07-07 22:22 GMT+09:00 jonas.partner :

> Hi Benjamin,
> Unfortunately this was a really critical issue for us and I didn’t think
> we would find a fix in time so we switched  to generating a hive scripts
> programmatically then running that via an Oozie action which uses the Hive
> CLI.  This seems to create a stable solution although is a lot less
> convenient than JDBC for our use case.
>
> I hope to find some more time to look at this later in the week since JDBC
> would simplify the solution.  I would be very interested to hear if you
> make any progress.
>
> Regards
>
> Jonas
>
> On 7 July 2014 at 14:14:46, Benjamin Bowman (bbowman...@gmail.com
> ) wrote:
>
> I believe I am having the same issue.  Hive 0.13 and Hadoop 2.4.  We had
> to increase the Hive heap to 4 GB which allows Hive to function for about
> 2-3 days.  After that point it has consumed the entire heap and becomes
> unresponsive and/or throws OOM exceptions.  We are using  Beeline and
> HiveServer 2 and connect via JDBC to the database tens of thousands of
> times a day.
>
> I have been working with a developer at Hortonworks to find a solution but
> we have not come up with anything yet.  Have you made any progress on this
> issue?
>
> Thanks,
> Benjamin
>
>
> On Thu, Jul 3, 2014 at 4:17 PM, jonas.partner  > wrote:
>
>>  Hi Edward,
>>
>>  Thanks for the response.  Sorry I posted the wrong version. I also added
>> close  on the two result sets to the code taken from the wiki as below but
>> still the same problem.
>>
>>  Will try to run it through your kit at the weekend.  For the moment I
>> switched to running the statements as a script through the hive client (not
>> beeline) which seems stable even with hundreds of repetitions.
>>
>>  Regards
>>
>>  Jonas
>>
>>   public static void run() throws SQLException {
>> try {
>> Class.forName(driverName);
>> } catch (ClassNotFoundException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> System.exit(1);
>> }
>> //replace "hive" here with the name of the user the queries
>> should run as
>> Connection con =
>> DriverManager.getConnection("jdbc:hive2://localhost:1/default", "hive",
>> "");
>> Statement stmt = con.createStatement();
>> String tableName = "testHiveDriverTable";
>> stmt.execute("drop table if exists " + tableName);
>> stmt.execute("create external table  " + tableName + " (key
>> int, value string)");
>> // show tables
>> String sql = "show tables '" + tableName + "'";
>> System.out.println("Running: " + sql);
>> ResultSet res = stmt.executeQuery(sql);
>> if (res.next()) {
>> System.out.println(res.getString(1));
>> }
>>  res.close();
>>  // describe table
>> sql = "describe " + tableName;
>> System.out.println("Running: " + sql);
>> res = stmt.executeQuery(sql);
>>
>> while (res.next()) {
>> System.out.println(res.getString(1) + "\t" +
>> res.getString(2));
>> }
>>  res.close();
>> stmt.close();
>> con.close();
>> }
>>
>>
>>
>> On 3 July 2014 at 21:05:25, Edward Capriolo (edlinuxg...@gmail.com
>> ) wrote:
>>
>>Not saying there is not a leak elswhere but
>> statement and resultset objects both have .close()
>>
>> Java 7 now allows you to autoclose
>> try (  Connection conn ...; Statement st = conn.createStatement() ){
>> something
>> }
>>
>>
>> On Thu, Jul 3, 2014 at 6:35 AM, jonas.partner <
>> jonas.part...@opencredo.com> wrote:
>>
>>>  We have been struggling to get a reliable system working where we
>>> interact with Hive over JDBC a lot.  The pattern we see is that everything
>>> starts ok but the memory used by the Hive server process grows over time
>>> and after some hundreds of operations we start to see exceptions.
>>>
>>>  To ensure there was nothing stupid in our code causing this I took the
>>> example code from the wiki page for Hive 2 clients and put that in a loop.
>>>  For us after about 80 runs we would see exceptions as below.
>>>
>>> 2014-04-21 07:31:02,251 ERROR [pool-5-thread-5]:
>>> server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred
>>> during processing of message.
>>>  java.lang.RuntimeException:
>>> org.apache.thrift.transport.TTransportException
>>> at
>>> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
>>> at
>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(T

Re: mapjoin error

2014-07-08 Thread Navis류승우

Looks like HIVE-6913(https://issues.apache.org/jira/browse/HIVE-6913) and
will be fixed in hive-0.14.0.

Thanks,
Navis


2014-07-04 17:12 GMT+09:00 sunww :

> Hi
> I'm using hive0.11 and hadoop2.2.  When I use a large table join on
> two empty tables , it convert to mapjoin automatic.But mr  failed, This is
> error log:
> Error: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.FileNotFoundException:
> /data/data5/yarn/local/usercache/etl/appcache/application_1403597071312_256708/container_1403597071312_256708_01_11/Stage-5.tar.gz/MapJoin-mapfile02--.hashtable
> (No such file or directory)
>
> And I find uploaded file is MapJoin-mapfile01--.hashtable, not
> MapJoin-mapfile02--.hashtable
> 2014-07-04 03:16:28 Upload 1 File to:
> file:/tmp/etl/hive_2014-07-04_15-16-19_742_5670921476466011329/-local-10002/HashTable-Stage-5/MapJoin-mapfile01--.hashtable
> File size: 82
> Is I miss something?
>
> sql and log is in http://paste2.org/U4zg9WPz
>
> Thanks
>

Re: "desc database extended " doesn't print dbproperties?

2014-06-25 Thread Navis류승우

Booked in https://issues.apache.org/jira/browse/HIVE-7298

Thanks,



2014-06-26 14:28 GMT+09:00 Navis류승우 :

> Seemed regression of HIVE-6386. Will be fixed in next version.
>
>
> 2014-06-26 7:58 GMT+09:00 Sumit Kumar :
>
>  Hey guys,
>>
>> I just discovered that this syntax doesn't print the dbproperties any
>> more. I've two hive versions that i'm testing following query on:
>>
>>   create database test2 with dbproperties ('key1' = 'value1', 'key2' =
>> 'value2');
>>   desc database extended test2;
>>
>> The output on hive 11 is:
>>
>> hive>   desc database extended
>> test2;
>> OK
>> test2 hdfs://:9000/warehouse/test2.db   {key2=value2,
>> key1=value1}
>> Time taken: 0.021 seconds, Fetched: 1 row(s)
>>
>> The output on hive 13 is:
>> hive> desc database extended
>> test2;
>> OK
>> test2 hdfs://:9000/warehouse/test2.dbhadoop
>> Time taken: 0.023 seconds, Fetched: 1 row(s)
>>
>> If you look closely, you would notice that no key value information from
>> dbproperties was printed in hive13 case and somehow magically "hadoop" (i
>> guess it's my userid) appeared.
>>
>> Any idea if this functionality changed since hive 11? Do we have a
>> reference jira? I searched on the wikis and JIRAs but couldn't find a
>> reference; surprised that the language manual wiki (
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL)
>> doesn't even talk about this functionality any more. Would appreciate input
>> on this.
>>
>> Thanks,
>> -Sumit
>>
>
>

Re: "desc database extended " doesn't print dbproperties?

2014-06-25 Thread Navis류승우

Seemed regression of HIVE-6386. Will be fixed in next version.


2014-06-26 7:58 GMT+09:00 Sumit Kumar :

> Hey guys,
>
> I just discovered that this syntax doesn't print the dbproperties any
> more. I've two hive versions that i'm testing following query on:
>
>   create database test2 with dbproperties ('key1' = 'value1', 'key2' =
> 'value2');
>   desc database extended test2;
>
> The output on hive 11 is:
>
> hive>   desc database extended
> test2;
> OK
> test2 hdfs://:9000/warehouse/test2.db   {key2=value2,
> key1=value1}
> Time taken: 0.021 seconds, Fetched: 1 row(s)
>
> The output on hive 13 is:
> hive> desc database extended
> test2;
> OK
> test2 hdfs://:9000/warehouse/test2.dbhadoop
> Time taken: 0.023 seconds, Fetched: 1 row(s)
>
> If you look closely, you would notice that no key value information from
> dbproperties was printed in hive13 case and somehow magically "hadoop" (i
> guess it's my userid) appeared.
>
> Any idea if this functionality changed since hive 11? Do we have a
> reference jira? I searched on the wikis and JIRAs but couldn't find a
> reference; surprised that the language manual wiki (
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL)
> doesn't even talk about this functionality any more. Would appreciate input
> on this.
>
> Thanks,
> -Sumit
>

Re: Hive 0.13/Hcatalog : Mapreduce Exception : java.lang.IncompatibleClassChangeError

2014-06-05 Thread Navis류승우

I don't have environment to confirm this. But if the this happens, we
should include HIVE-6432 into HIVE-0.13.1.


2014-06-05 12:44 GMT+09:00 Navis류승우 :

> It's fixed in HIVE-6432. I think you should rebuild your own hcatalog from
> source with profile -Phadoop-1.
>
>
> 2014-06-05 9:08 GMT+09:00 Sundaramoorthy, Malliyanathan <
> malliyanathan.sundaramoor...@citi.com>:
>
>   Hi,
>>
>> I am using Hadoop 2.4.0 with Hive 0.13 + included package of HCatalog .
>> Wrote a simple map-reduce job from the example and running the code below
>> .. getting “*Exception in thread "main"
>> java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.JobContext, but class was expected“ * .. Not
>> sure of the error I am making ..
>>
>> Not sure if there a compatibility issue .. please help..
>>
>>
>>
>> *boolean* success = *true*;
>>
>>   *try* {
>>
>>   Configuration conf = getConf();
>>
>>   args = *new* GenericOptionsParser(conf,
>> args).getRemainingArgs();
>>
>>   //Hive Table Details
>>
>>   String dbName = args[0];
>>
>>   String inputTableName= args[1];
>>
>>   String outputTableName= args[2];
>>
>>
>>
>>   //Job Input
>>
>>   Job job = *new* *Job**(conf,**"Scenarios"**)*;
>>
>>
>>   //Initialize Map/Reducer Input/Output
>>
>>   HCatInputFormat.*setInput*(job,dbName,inputTableName);
>>
>>   //HCatInputFormat.ssetInput(job,InputJobInfo.create(dbName,
>> inputTableName, null));
>>
>>   job.setInputFormatClass(HCatInputFormat.*class*);
>>
>>   job.setJarByClass(MainRunner.*class*);
>>
>>job.setMapperClass(ScenarioMapper.*class*);
>>
>> job.setReducerClass(ScenarioReducer.*class*);
>>
>>job.setMapOutputKeyClass(IntWritable.*class*);
>>
>> job.setMapOutputValueClass(IntWritable.*class*);
>>
>>
>>
>> job.setOutputKeyClass(WritableComparable.*class*);
>>
>> job.setOutputValueClass(DefaultHCatRecord.*class*);
>>
>>
>>
>> HCatOutputFormat.*setOutput*(job, OutputJobInfo.*create*(dbName,
>> outputTableName, *null*));
>>
>> HCatSchema outSchema = HCatOutputFormat.*getTableSchema*(conf);
>>
>> System.*err*.println("INFO: output schema explicitly set for
>> writing:"+ outSchema);
>>
>>HCatOutputFormat.*setSchema*(job, outSchema);
>>
>> job.setOutputFormatClass(HCatOutputFormat.*class*);
>>
>>
>>
>>
>>
>> 14/06/02 18:52:57 INFO client.RMProxy: Connecting to ResourceManager at
>> localhost/00.04.07.174:8040
>>
>> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
>> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
>>
>> at
>> org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:104)
>>
>> at
>> org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:84)
>>
>> at
>> org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:73)
>>
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
>>
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
>>
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>>
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>>
>> at
>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
>>
>> at
>> com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.run(MainRunner.java:79)
>>
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>
>> at
>> com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.main(MainRunner.java:89)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>
>>
>>
>> Regards,
>>
>> Malli
>>
>>
>>
>
>

Re: Hive 0.13/Hcatalog : Mapreduce Exception : java.lang.IncompatibleClassChangeError

2014-06-04 Thread Navis류승우

It's fixed in HIVE-6432. I think you should rebuild your own hcatalog from
source with profile -Phadoop-1.


2014-06-05 9:08 GMT+09:00 Sundaramoorthy, Malliyanathan <
malliyanathan.sundaramoor...@citi.com>:

>  Hi,
>
> I am using Hadoop 2.4.0 with Hive 0.13 + included package of HCatalog .
> Wrote a simple map-reduce job from the example and running the code below
> .. getting “*Exception in thread "main"
> java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected“ * .. Not
> sure of the error I am making ..
>
> Not sure if there a compatibility issue .. please help..
>
>
>
> *boolean* success = *true*;
>
>   *try* {
>
>   Configuration conf = getConf();
>
>   args = *new* GenericOptionsParser(conf,
> args).getRemainingArgs();
>
>   //Hive Table Details
>
>   String dbName = args[0];
>
>   String inputTableName= args[1];
>
>   String outputTableName= args[2];
>
>
>
>   //Job Input
>
>   Job job = *new* *Job**(conf,**"Scenarios"**)*;
>
>
>   //Initialize Map/Reducer Input/Output
>
>   HCatInputFormat.*setInput*(job,dbName,inputTableName);
>
>   //HCatInputFormat.ssetInput(job,InputJobInfo.create(dbName,
> inputTableName, null));
>
>   job.setInputFormatClass(HCatInputFormat.*class*);
>
>   job.setJarByClass(MainRunner.*class*);
>
>job.setMapperClass(ScenarioMapper.*class*);
>
> job.setReducerClass(ScenarioReducer.*class*);
>
>job.setMapOutputKeyClass(IntWritable.*class*);
>
> job.setMapOutputValueClass(IntWritable.*class*);
>
>
>
> job.setOutputKeyClass(WritableComparable.*class*);
>
> job.setOutputValueClass(DefaultHCatRecord.*class*);
>
>
>
> HCatOutputFormat.*setOutput*(job, OutputJobInfo.*create*(dbName,
> outputTableName, *null*));
>
> HCatSchema outSchema = HCatOutputFormat.*getTableSchema*(conf);
>
> System.*err*.println("INFO: output schema explicitly set for
> writing:"+ outSchema);
>
>HCatOutputFormat.*setSchema*(job, outSchema);
>
> job.setOutputFormatClass(HCatOutputFormat.*class*);
>
>
>
>
>
> 14/06/02 18:52:57 INFO client.RMProxy: Connecting to ResourceManager at
> localhost/00.04.07.174:8040
>
> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
>
> at
> org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:104)
>
> at
> org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:84)
>
> at
> org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:73)
>
> at
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
>
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
>
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
>
> at
> com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.run(MainRunner.java:79)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
> at
> com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.main(MainRunner.java:89)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
>
>
> Regards,
>
> Malli
>
>
>

Re: what is the meaning of Table.createTime()?

2014-05-21 Thread Navis류승우

It's seconds.

new Date(time * 1000);


2014-05-22 14:19 GMT+09:00 Santhosh Thomas :

> I am trying to find the creation time of a table using table.createTime()
> function. I was hoping that it returns the time in milli seconds, but looks
> like it is not. Any idea how to get the actual table creation time?
>
> thanks
> Santhosh
>

Re: Reading query columns in ExecuteWithHookContext

2014-04-14 Thread Navis류승우

It's a bug in ColumnAccessAnalyzer. I've booked this on
https://issues.apache.org/jira/browse/HIVE-6910.

Thanks,
Navis

2014-04-15 11:41 GMT+09:00 Adeel Qureshi :
> I am trying to read the columns from hive queries being executed by
> implementing the ExecuteWithHookContext hook. This works fine by extracting
> ColumnAccessInfo information from HiveContext (which is passed in)
> .getQueryPlan().getColumnAccessInfo(). This provides access to a
> TableToColumnAccessMap which has all the columns from the query in it along
> with user information. So this works fine.
>
> However when I run same queries on partition tables the list of columns
> returned by TableToColumnAccessMap are not correct. It includes the
> partition columns but ends up excluding some of the non-partioned columns.
> So for a 5 column table with last 2 being partitioned columns it would
> return 1 non-partioned and 2 partioned columns and simply ignore the other
> two partitioned columns. Any ideas on what that might be the case or any
> other ways on getting a handle on columns of a query being run.
>
> Thanks
> Adeel

Re: Hive vs Pig against number of files spawned

2014-04-01 Thread Navis류승우

try
hive.hadoop.supports.splittable.combineinputformat=true;

Thanks,
Navis

2014-04-01 15:55 GMT+09:00 Sreenath :
> Hi all,
> I have a partitioned table in hive where each partition will have 630 gzip
> compressed files each of average size 100kb. If I query over these files
> using hive it will generate exactly 630 mappers i.e one mapper for one file.
> Now as an experiment i tried reading those files with pig and pig actually
> combined the files and spawned only 2 mappers and the operation was much
> faster than hive.
> Why is there a difference in execution style of pig and hive? In hive can we
> similarly combine small files to spawn less mappers?

Re: Issue with Querying External Hive Table created on hbase

2014-03-19 Thread Navis류승우

You can check the exact reason from job log, but generally, it's
caused by missing libs in auxlib conf.

Thar includes hive-hbase-handler.jar, hbase-*.jar, guava-*.jar,
zookeeper-*.jar, etc. ,varying the version of your hive and hbase.

Thanks,
Navis

2014-03-20 3:42 GMT+09:00 Sunil Ranka :
> Hi All
>
> I am trying to query  External Hive Table created on hbase ( hbase table is
> compressed using "gzip") .  I am getting quick response, if I use "select *
> from hbase_acct_pref_dim_", but the query is taking for ever if I try to
> retrieve data based on the row_key.
>
> hive> select * from hbase_acct_pref_dim_ where key = 30001;
>
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 0
> 2014-03-19 11:14:04,432 Stage-1 map = 0%,  reduce = 0%
> 2014-03-19 11:15:04,617 Stage-1 map = 0%,  reduce = 0%
> 2014-03-19 11:16:04,792 Stage-1 map = 0%,  reduce = 0%
> 2014-03-19 11:17:04,969 Stage-1 map = 0%,  reduce = 0%
> 2014-03-19 11:18:05,140 Stage-1 map = 0%,  reduce = 0%
> 2014-03-19 11:19:05,315 Stage-1 map = 0%,  reduce = 0%
> 2014-03-19 11:20:05,484 Stage-1 map = 0%,  reduce = 0%
> 2014-03-19 11:21:05,667 Stage-1 map = 0%,  reduce = 0%
> 2014-03-19 11:22:05,835 Stage-1 map = 0%,  reduce = 0%
>
>
> Any Help is appreciated.
>
>
>
> Thanks,
> Sunil S Ranka
>  Blog :: http://sranka.wordpress.com
> "Superior BI is the antidote to Business Failure"
> 
> NOTHING IS IMPOSSIBLE EVEN THE WORD
> IMPOSSIBLE SAYS ,I M POSSIBLE.
>

Re: ALTER TABLE RENAME TO fully qualified name

2014-03-14 Thread Navis류승우

Yes, there are some similar issues, fixed in various ways. Thinking of
merging all of them into one.

I'll check HIVE-2584. Thanks.

2014-03-14 14:50 GMT+09:00 chenchun :
> Navis, I have already done that, can you take a look? see
> https://issues.apache.org/jira/browse/HIVE-2584
>
> --
> chenchun
>
> On Friday, 14 March, 2014 at 11:50 AM, Navis류승우 wrote:
>
> HIVE-4064 is expected to solve this kind of problems in hive but not
> in progress even it's not that hard to implement.
>
> I'll take a look at it when I'm free.
>
> Thanks,
> Navis
>
> 2014-03-13 22:08 GMT+09:00 Clay McDonald :
>
> Hello everyone, I'm running Hive 0.12.0 and I thought that support for
> moving tables between database was added by using ALTER TABLE RENAME TO
> fully qualified name. Here is what I executing;
>
>
>
> ALTER TABLE default.sale_reports RENAME TO sales_database.sale_reports;
>
>
>
> Thanks, Clay
>
>

Re: Job killed on HiveServer2 restart

2014-03-13 Thread Navis류승우

User provided classes (by adding jars) should be unloaded when the
session is closed. https://issues.apache.org/jira/browse/HIVE-3969 is
about that but it's not resolved yet.

Thanks,
Navis

2014-03-12 8:56 GMT+09:00 Ashu Pachauri :
> We use Hive-0.12 and are planning to use HiveServer2 with cloudera Hue. The
> scenario is like this. There are frequent additions of Hive UDFs by users
> and this requires frequent Hive deployment. To pick up these changes, we
> need to restart HiveServer2.
> When we submit a query to HiveServer2, the job is killed whenever we restart
> HiveServer. So, basically, we find no way to persist the query while picking
> up these Hive UDF changes. Is there a way to persist a query with HS2
> restart or not requiring to restart HS2 for picking up new changes.
>
>
> --
> Thanks and Regards,
> Ashu

Re: ALTER TABLE RENAME TO fully qualified name

2014-03-13 Thread Navis류승우

HIVE-4064 is expected to solve this kind of problems in hive but not
in progress even it's not that hard to implement.

I'll take a look at it when I'm free.

Thanks,
Navis

2014-03-13 22:08 GMT+09:00 Clay McDonald :
> Hello everyone, I'm running Hive 0.12.0 and I thought that support for
> moving tables between database was added by using ALTER TABLE RENAME TO
> fully qualified name. Here is what I executing;
>
>
>
> ALTER TABLE default.sale_reports RENAME TO sales_database.sale_reports;
>
>
>
> Thanks, Clay

Re: full outer join result

2014-03-12 Thread Navis류승우

The 100 100 is the sole matching row for the join condition, so it
would be right result,

NULLNULLNULL40
NULLNULL12  35
NULLNULL48  NULL
NULL40  NULLNULL
12  35  NULLNULL
48  NULLNULLNULL
100 100 100 100

It's fixed by HIVE-3411 and HIVE-3381

2014-03-13 5:38 GMT+09:00 Stephen Sprague :
> well. i had some free time to search it.  from here:
> http://www.postgresql.org/docs/9.3/static/sql-select.html#SQL-UNION you'll
> see the default is indeed UNION DISTINCT.  so changing it to UNION ALL
> you'll get different results - are they the ones you're expecting?
>
>
> On Wed, Mar 12, 2014 at 9:36 AM, Stephen Sprague  wrote:
>>
>> interesting.don't know the answer but could you change the UNION in
>> the Postgres to UNION ALL?  I'd be curious if the default is UNION DISTINCT
>> on that platform. That would at least partially explain postgres behaviour
>> leaving hive the odd man out.
>>
>>
>>
>> On Wed, Mar 12, 2014 at 6:47 AM, Martin Kudlej  wrote:
>>>
>>> Hi all,
>>>
>>> I've tried BigTop test for join_filters:
>>> CREATE TABLE myinput1(key int, value int);
>>> LOAD DATA LOCAL INPATH 'seed_data_files/in3.txt' INTO TABLE myinput1;
>>>
>>> where seed_data_files/in3.txt:
>>> 12  35
>>> NULL40
>>> 48  NULL
>>> 100 100
>>>
>>> I've tried:
>>> SELECT * FROM myinput1 a FULL OUTER JOIN myinput1 b on a.key > 40 AND
>>> a.value > 50 AND a.key = a.value AND b.key > 40 AND b.value > 50 AND b.key =
>>> b.value ORDER BY a.key, a.value, b.key, b.value;
>>>
>>> and expected result in test is:
>>> NULL  NULL  NULL  40
>>> NULL  NULL  NULL  40
>>> NULL  NULL  NULL  40
>>> NULL  NULL  NULL  40
>>> NULL  NULL  12  35
>>> NULL  NULL  12  35
>>> NULL  NULL  12  35
>>> NULL  NULL  12  35
>>> NULL  NULL  48  NULL
>>> NULL  NULL  48  NULL
>>> NULL  NULL  48  NULL
>>> NULL  NULL  48  NULL
>>> NULL  40  NULL  NULL
>>> NULL  40  NULL  NULL
>>> NULL  40  NULL  NULL
>>> NULL  40  NULL  NULL
>>> 12  35  NULL  NULL
>>> 12  35  NULL  NULL
>>> 12  35  NULL  NULL
>>> 12  35  NULL  NULL
>>> 48  NULL  NULL  NULL
>>> 48  NULL  NULL  NULL
>>> 48  NULL  NULL  NULL
>>> 48  NULL  NULL  NULL
>>> 100 100 NULL  NULL
>>> 100 100 NULL  NULL
>>> 100 100 NULL  NULL
>>> 100 100 100 100
>>>
>>>
>>> but real hive result is:
>>> NULLNULLNULL40
>>> NULLNULL12  35
>>> NULLNULL48  NULL
>>> NULL40  NULLNULL
>>> 12  35  NULLNULL
>>> 48  NULLNULLNULL
>>> 100 100 100 100
>>>
>>> btw. result from postgresql is:
>>> (SELECT *
>>>   FROM myinput1 a
>>> LEFT JOIN
>>>   myinput1 b on
>>> a.key > 40 AND
>>> a.value > 50 AND
>>> a.key = a.value AND
>>> b.key > 40 AND
>>> b.value > 50 AND
>>> b.key = b.value  ORDER BY a.key, a.value, b.key, b.value)
>>> UNION (SELECT *
>>>   FROM myinput1 a
>>> RIGHT JOIN
>>>   myinput1 b on
>>> a.key > 40 AND
>>> a.value > 50 AND
>>> a.key = a.value AND
>>> b.key > 40 AND
>>> b.value > 50 AND
>>> b.key = b.value
>>> ORDER BY a.key, a.value, b.key, b.value);
>>>  |   |  12 |35
>>>   12 |35 | |
>>>  |   |  48 |
>>>   48 |   | |
>>>  |40 | |
>>>  |   | |40
>>>  100 |   100 | 100 |   100
>>>
>>> so it's the same like in hive.
>>>
>>> What is the right result for this full outer join in HiveQL, please?
>>>
>>> --
>>> Best Regards,
>>> Martin Kudlej.
>>> MRG/Grid & RHS-Hadoop Senior Quality Assurance Engineer
>>> Red Hat Czech s.r.o.
>>>
>>> Phone: +420 532 294 155
>>> E-mail:mkudlej at redhat.com
>>> IRC:   mkudlej at #brno, #messaging, #grid, #rhs, #distcomp
>>
>>
>

Re: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread Navis류승우

Then you should use BETWEEN, not IN. BETWEEN can be used for PPD, afaik.

2014-03-11 16:33 GMT+09:00 Petter von Dolwitz (Hem)
:
> Hi Young,
>
> I must argue that the partition pruning do actually work if I don't use the
> IN clause. What I wanted to achieve in my original query was to specify a
> range of partitions in a simple way. The same query can be expressed as
>
> SELECT * FROM mytable WHERE partitionCol >= UDF("2014-03-10") and
> partitionCol <= UDF("2014-03-11");
>
> This UDF returns an INT (rather than an INT array). Both this UDF and the
> original one are annotated with @UDFType(deterministic = true) (if that has
> any impact) . This variant works fine and does partition pruning. Note that
> I don't have another column as input to my UDF but a static value.
>
> Thanks,
> Petter
>
>
>
>
> 2014-03-11 0:16 GMT+01:00 java8964 :
>
>> I don't know from syntax point of view, if Hive will allow to do "columnA
>> IN UDF(columnB)".
>>
>> What I do know that even let's say above work, it won't do the partition
>> pruning.
>>
>> The partition pruning in Hive is strict static, any dynamic values
>> provided to partition column won't enable partition pruning, even though it
>> is a feature I missed too.
>>
>> Yong
>>
>> 
>> Date: Mon, 10 Mar 2014 16:23:01 +0100
>> Subject: Using an UDF in the WHERE (IN) clause
>> From: petter.von.dolw...@gmail.com
>> To: user@hive.apache.org
>>
>>
>> Hi,
>>
>> I'm trying to get the following query to work. The parser don't like it.
>> Anybody aware of a workaround?
>>
>> SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
>>
>> partitionCol is my partition column of type INT and I want to achieve
>> early pruning. I've tried returning an array of INTs from my_udf and also a
>> plain string in the format (1,2,3). It seems like the parser wont allow me
>> to put an UDF in this place.
>>
>> Any help appreciated.
>>
>> Thanks,
>> Petter
>
>

Re: Using an UDF in the WHERE (IN) clause

2014-03-10 Thread Navis류승우

(KW_IN expressions)
   -> ^(TOK_FUNCTION KW_IN $precedenceEqualExpression expressions)

expressions
:
LPAREN expression (COMMA expression)* RPAREN -> expression*
;

You should have arguments of IN wrapped by parentheses. But It seemed
not possible to use array returning expression in it (type mismatch in
current hive).

We might extend IN function to accept single array as a argument.


2014-03-11 8:16 GMT+09:00 java8964 :
> I don't know from syntax point of view, if Hive will allow to do "columnA IN
> UDF(columnB)".
>
> What I do know that even let's say above work, it won't do the partition
> pruning.
>
> The partition pruning in Hive is strict static, any dynamic values provided
> to partition column won't enable partition pruning, even though it is a
> feature I missed too.
>
> Yong
>
> 
> Date: Mon, 10 Mar 2014 16:23:01 +0100
> Subject: Using an UDF in the WHERE (IN) clause
> From: petter.von.dolw...@gmail.com
> To: user@hive.apache.org
>
>
> Hi,
>
> I'm trying to get the following query to work. The parser don't like it.
> Anybody aware of a workaround?
>
> SELECT * FROM mytable WHERE partitionCol IN my_udf("2014-03-10");
>
> partitionCol is my partition column of type INT and I want to achieve early
> pruning. I've tried returning an array of INTs from my_udf and also a plain
> string in the format (1,2,3). It seems like the parser wont allow me to put
> an UDF in this place.
>
> Any help appreciated.
>
> Thanks,
> Petter

Re: Limited capabilities of a custom input format

2014-03-04 Thread Navis류승우

You can override input format by set hive.input.format=xxx. But
*HiveInputFormat have some internal works(predicates, io contexts, etc.)
for hive. So it would not be easy to implement new one (or overriding some
methods). But you can try.

I've though I saw an issue for supporting custom location provider for
partitioned table, but cannot find it. Might be a bogus signal.

Thanks,
Navis


2014-03-05 0:00 GMT+09:00 Petter von Dolwitz (Hem) <
petter.von.dolw...@gmail.com>:

> Hi Navis,
>
> thanks for pointing this one out! It would for sure be one way around it.
> In my use case it would require adding this extra where clause for
> particular tables. I guess I can create a view to make this more
> transparent.
>
> Do you know why my import format is not used on the hadoop side? I'm sure
> this is by design but I wanted to understand why. Also, are you aware of
> any discussions supporting partitioning on file level rather than on
> directory level?
>
> Thanks,
> Petter
>
>
> 2014-03-04 7:32 GMT+01:00 Navis류승우 :
>
> You might be interested in https://issues.apache.org/jira/browse/HIVE-1662,
>> using predicate on file-name vc to filter out inputs. For example,
>>
>> select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME
>> rlike '.*/srcbucket2[03].txt'
>>
>> But it's not committed, yet.
>>
>> Thanks,
>>
>>
>>
>> 2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) <
>> petter.von.dolw...@gmail.com>:
>>
>> Hi,
>>>
>>> I have implemented a few custom input formats in Hive. It seems like
>>> only the getRecordReader() method of these input formats is being called
>>> though, i.e. there is no way of overriding the listStatus() method and
>>> provide a custom input filter. The only way I can set a file filter is by
>>> using the mapred.input.pathFilter.class property which leaves me at using
>>> the same filter for all input formats. I would like a way to specify a
>>> filter per input format. Is there a way around this limitation?
>>>
>>> I am on Hive 0.10. I think I have seen that when running jobs locally
>>> that the listStatus() method of my input formats are called but not when
>>> handing over the job to a hadoop cluster. It seems like the listStatus is
>>> called on hadoops CombineFileInputFormat instead.
>>>
>>> Thanks,
>>> Petter
>>>
>>
>>
>

Re: How to solve the garbage problem

2014-03-04 Thread Navis류승우

Declare it as a binary column and use decoding UDF when accessing it.

Thanks,
Navis


2014-02-20 12:21 GMT+09:00 kun yan :

>
> Hi all
>
> GBK encoding data files, but the hive is UTF-8 encoding
>
> select * from table display normal
>
> I try to modify the following configuration
>
> system:file.encoding=UTF-8
> system:sun.jnu.encoding=UTF-8
>
> But no effect, how do I deal with it
>
>
> --
>
> In the Hadoop world, I am just a novice, explore the entire Hadoop
> ecosystem, I hope one day I can contribute their own code
>
> YanBit
> yankunhad...@gmail.com
>
>

Re: Query regarding Hive Parallel Orderby

2014-03-04 Thread Navis류승우

bq. Is my understanding correct?

Yes.

bq. Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?

Yes.

Thanks,


2014-02-21 13:32 GMT+09:00 Vaibhav Jain :

> Hi,
>
> Hive 12 has added the functionality of parallel order by. I have a few
> queries regarding the working of it.
> From the source code I have figured out that to do a parallel orderby , a
> partition table needs to created
> which is provided as an input to TotalOrderPartitioner.  To create the
> partition table, a sample of
> the hive table is stored as ArrayList of byte arrays and then sorted.
>
> So I have the following queries :
>
> 1)  Is my understanding correct?
>
> 2) Isn't it a possibility that storing the entire sample in memory would
> become a bottleneck when the sample size is large?
>
>
> --
> Thanks
> Vaibhav Jain
>

Re: Hive hbase handler composite key - hbase full scan on key

2014-03-03 Thread Navis류승우

https://issues.apache.org/jira/browse/HIVE-6411 is exactly for the cases.

The bad new is that it seemed not included even in 0.13.0 and you should
implement own predicate analyzer.

Thanks,
Navis


2014-03-03 20:52 GMT+09:00 Juraj jiv :

> Hello,
> im currently testing Hbase integration into Hive. I want to use fast hbase
> key lookup in Hive but my hbase key is composite.
> I found a solution how to crete table with hbase key as struct which work
> fine:
>
> CREATE EXTERNAL TABLE table_tst(
> key struct, 
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '_'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' ...
>
> But if i use this select in hive:
> select * from table_tst where key.a = '1407273705';
> It takes about 860 seconds to print 2 records. So it makes full scan :/
>
> If i use similar select from Java Hbase API as:
> Scan scan = new Scan();
> scan.setStartRow("1407273705".getBytes());
> scan.setStopRow("1407273705~".getBytes());
>
> Note: "~" is end char for me - it has high byte value, my composite key
> delimiter is "_"
> This select 2 records in 2 seconds.
>
> How can i tell Hive go with start/stop scanner over this key.a value...
>
> JV
>

Re: Limited capabilities of a custom input format

2014-03-03 Thread Navis류승우

You might be interested in https://issues.apache.org/jira/browse/HIVE-1662,
using predicate on file-name vc to filter out inputs. For example,

select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME rlike
'.*/srcbucket2[03].txt'

But it's not committed, yet.

Thanks,



2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) <
petter.von.dolw...@gmail.com>:

> Hi,
>
> I have implemented a few custom input formats in Hive. It seems like only
> the getRecordReader() method of these input formats is being called though,
> i.e. there is no way of overriding the listStatus() method and provide a
> custom input filter. The only way I can set a file filter is by using the
> mapred.input.pathFilter.class property which leaves me at using the same
> filter for all input formats. I would like a way to specify a filter per
> input format. Is there a way around this limitation?
>
> I am on Hive 0.10. I think I have seen that when running jobs locally that
> the listStatus() method of my input formats are called but not when handing
> over the job to a hadoop cluster. It seems like the listStatus is called on
> hadoops CombineFileInputFormat instead.
>
> Thanks,
> Petter
>

Re: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang

2014-03-01 Thread Navis류승우

Congratulations, Xuefu!


2014-03-01 14:38 GMT+09:00 Lefty Leverenz :

> Congrats Xuefu!
>
> -- Lefty
>
>
> On Fri, Feb 28, 2014 at 2:52 PM, Eric Hanson (BIG DATA) <
> eric.n.han...@microsoft.com> wrote:
>
>> Congratulations Xuefu!
>>
>> -Original Message-
>> From: Remus Rusanu [mailto:rem...@microsoft.com]
>> Sent: Friday, February 28, 2014 11:43 AM
>> To: d...@hive.apache.org; user@hive.apache.org
>> Cc: Xuefu Zhang
>> Subject: RE: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang
>>
>> Grats!
>> 
>> From: Prasanth Jayachandran 
>> Sent: Friday, February 28, 2014 9:11 PM
>> To: d...@hive.apache.org
>> Cc: user@hive.apache.org; Xuefu Zhang
>> Subject: Re: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang
>>
>> Congratulations Xuefu!
>>
>> Thanks
>> Prasanth Jayachandran
>>
>> On Feb 28, 2014, at 11:04 AM, Vaibhav Gumashta 
>> wrote:
>>
>> > Congrats Xuefu!
>> >
>> >
>> > On Fri, Feb 28, 2014 at 9:20 AM, Prasad Mujumdar > >wrote:
>> >
>> >>   Congratulations Xuefu !!
>> >>
>> >> thanks
>> >> Prasad
>> >>
>> >>
>> >>
>> >> On Fri, Feb 28, 2014 at 1:20 AM, Carl Steinbach 
>> wrote:
>> >>
>> >>> I am pleased to announce that Xuefu Zhang has been elected to the
>> >>> Hive Project Management Committee. Please join me in congratulating
>> Xuefu!
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Carl
>> >>>
>> >>>
>> >>
>> >
>> > --
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or
>> > entity to which it is addressed and may contain information that is
>> > confidential, privileged and exempt from disclosure under applicable
>> > law. If the reader of this message is not the intended recipient, you
>> > are hereby notified that any printing, copying, dissemination,
>> > distribution, disclosure or forwarding of this communication is
>> > strictly prohibited. If you have received this communication in error,
>> > please contact the sender immediately and delete it from your system.
>> Thank You.
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: KryoException -missing no-arg constructor ASTNodeOrigin

2014-02-26 Thread Navis류승우

It's HIVE-5779 and will be fixed in hive-0.13.0.

Thanks,
Navis


2014-02-21 21:07 GMT+09:00 Rafal Janik :

> Hi All,
>
> I've just started my adventure with Hive so I'm not sure if it's an issue
> here or just my misunderstanding...
> I'm using Hortonworks Sandbox 2.0 (Hive 0.12.0.2.0.6.0-76)
> I'm following hortonworks spring-xd tutorial and the last step is to
> create a table as a select of two views (all other views and tables were
> created in hortonworks sandbox beeswax).
>
> So in hive console I've run:
>
> hive> create table test_abcd stored as RCFile AS select t.*, s.* from
> cytweets_clean t left outer join tweets_sentiment s on t.id=s.id;
>
> which raised the following exception:
>
> com.esotericsoftware.kryo.KryoException: Class cannot be created (missing
> no-arg constructor): org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
> Serialization trace:
> origin (org.apache.hadoop.hive.ql.parse.ASTNode)
> children (org.apache.hadoop.hive.ql.parse.ASTNode)
> children (org.apache.hadoop.hive.ql.parse.ASTNode)
> expressionMap (org.apache.hadoop.hive.ql.parse.RowResolver)
> rr (org.apache.hadoop.hive.ql.parse.OpParseContext)
> opParseCtxMap (org.apache.hadoop.hive.ql.plan.MapWork)
> mapWork (org.apache.hadoop.hive.ql.plan.MapredWork)
> at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097)
> ...
> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
> at org.apache.hadoop.hive.ql.exec.Utilities.
> deserializeObjectByKryo(Utilities.java:810)
> ...
> at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.
> compile(MapReduceCompiler.java:300)
> ...
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ...
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
>
> Am I doing something wrong here?
>
> regards
>
> rafal
>
>
>

Re: Hive Query :: Implementing case statement

2014-02-18 Thread Navis류승우

If key is unique, you might overwrite values by using hbase handler.


2014-02-18 22:05 GMT+09:00 yogesh dhari :

> Yes, Hive does not provide update statement, I am just looking for the
> work arround it, how to implement it
>
>
>
>
>
> On Tue, Feb 18, 2014 at 6:27 PM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:
>
>>  From https://cwiki.apache.org/confluence/display/Hive/Home
>>
>>
>>
>> "Hive is not designed for OLTP workloads and does not offer real-time
>> queries or row-level updates."
>>
>>
>>
>> As far as I am aware "UPDATE" isn't even in the Hive DML.
>>
>>
>>
>> Z
>>
>>  *Peter Marron*
>> Senior Developer
>> Trillium Software, A Harte Hanks Company
>>
>> Theale Court, 1st Floor, 11-13 High Street
>> Theale
>> RG7 5AH
>>
>> +44 (0) 118 940 7609 office
>> +44 (0) 118 940 7699 fax
>>
>> [image:
>> https://4b2685446389bc779b46-5f66fbb59518cc4fcae8900db28267f5.ssl.cf2.rackcdn.com/trillium.png]
>>
>> trilliumsoftware.com  / 
>> linkedin
>> / twitter  / 
>> facebook
>>
>>
>>
>> *From:* yogesh dhari [mailto:yogeshh...@gmail.com]
>> *Sent:* 18 February 2014 12:39
>> *To:* user@hive.apache.org
>> *Subject:* Hive Query :: Implementing case statement
>>
>>
>>
>> Hello All,
>>
>>
>>
>> I have a use case where a table say TABLE_SQL is geting updated like.
>>
>>
>>
>>
>>
>> 1st Update Command
>>
>>
>>
>> update TABLE_SQL a
>>
>> set BALANCE = b.prev
>>
>> from TABLE_SQL_2 b
>>
>> where a.key = b.key and a.code = "1"
>>
>>
>>
>>
>>
>> 2nd Update Command
>>
>>
>>
>> update TABLE_SQL a
>>
>> set BALANCE = b.prev
>>
>> from TABLE_SQL_3 b
>>
>> where a.key = b.key and a.code = "2"
>>
>>
>>
>>
>>
>> same column is getting update twice in sql table,
>>
>>
>>
>> I have a Table in Hive say TABLE_HIVE.
>>
>>
>>
>> How to perform this kind operatation in HIVE,
>>
>>
>>
>> Pls Help,
>>
>>
>>
>> Thanks in Advance
>>
>> Yogesh Kumar
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
<>

Re: Sampling from a single column

2014-02-12 Thread Navis류승우

If it should be sampled using subquery would be inevitable, something like,

select x from (select distinct key as x from src)a where rand() > 0.9 limit
10;



2014-02-12 6:07 GMT+09:00 Oliver Keyes :

> Hey all
>
> So, what I'm looking to do is get N randomly-sampled distinct values from
> a column in a table. I'm kind of flummoxed by how to do this without using
> TABLESAMPLE, which would require me to add Yet Another Subquery (it'd be
> 'select these values, from this sample, from these distinct values'). I
> could swear I saw a simple sample() function while browsing the
> documentation just last week, but I'll be damned if I can find it again.
> Can anyone help me out, or is Yet Another Subquery the way to go?
>
> Thanks!
>

Re: Issue with Hive and table with lots of column

2014-02-12 Thread Navis류승우

With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less
memory than before.

Could you try it with the version in trunk?


2014-02-13 10:49 GMT+09:00 Stephen Sprague :

> question to the original poster.  closure appreciated!
>
>
> On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague wrote:
>
>> thanks Ed. And on a separate tact lets look at Hiveserver2.
>>
>>
>> @OP>
>>
>> *I've tried to look around on how i can change the thrift heap size but
>> haven't found anything.*
>>
>>
>> looking at my hiveserver2 i find this:
>>
>>$ ps -ef | grep -i hiveserver2
>>dwr   9824 20479  0 12:11 pts/100:00:00 grep -i hiveserver2
>>dwr  28410 1  0 00:05 ?00:01:04
>> /usr/lib/jvm/java-6-sun/jre/bin/java 
>> *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log
>> -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=
>> -Dhadoop.root.logger=INFO,console
>> -Djava.library.path=/usr/lib/hadoop/lib/native
>> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
>> -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar
>> /usr/lib/hive/lib/hive-service-0.12.0.jar
>> org.apache.hive.service.server.HiveServer2
>>
>>
>>
>>
>> questions:
>>
>>1. what is the output of "ps -ef | grep -i hiveserver2" on your
>> system? in particular what is the value of -Xmx ?
>>
>>2. can you restart your hiveserver with -Xmx1g? or some value that
>> makes sense to your system?
>>
>>
>>
>> Lots of questions now.  we await your answers! :)
>>
>>
>>
>> On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo 
>> wrote:
>>
>>> Final table compression should not effect the de serialized size of the
>>> data over the wire.
>>>
>>>
>>> On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague wrote:
>>>
 Excellent progress David.   So.  What the most important thing here we
 learned was that it works (!) by running hive in local mode and that this
 error is a limitation in the HiveServer2.  That's important.

 so textfile storage handler and having issues converting it to ORC.
 hmmm.

 follow-ups.

 1. what is your query that fails?

 2. can you add a "limit 1" to the end of your query and tell us if that
 works? this'll tell us if it's column or row bound.

 3. bonus points. run these in local mode:
   > set hive.exec.compress.output=true;
   > set mapred.output.compression.type=BLOCK;
   > set
 mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
   > create table blah stored as ORC as select * from >>> table>;   #i'm curious if this'll work.
   > show create table blah;  #send output back if previous step
 worked.

 4. extra bonus.  change ORC to SEQUENCEFILE in #3 see if that works any
 differently.



 I'm wondering if compression would have any effect on the size of the
 internal ArrayList the thrift server uses.



 On Fri, Jan 31, 2014 at 9:21 AM, David Gayou wrote:

> Ok, so here are some news :
>
> I tried to boost the HADOOP_HEAPSIZE to 8192,
> I also setted the mapred.child.java.opts to 512M
>
> And it doesn't seem's to have any effect.
>  --
>
> I tried it using an ODBC driver => fail after few minutes.
> Using a local JDBC (beeline) => running forever without any error.
>
> Both through hiveserver 2
>
> If i use the local mode : it works!   (but that not really what i
> need, as i don't really how to access it with my software)
>
> --
> I use a text file as storage.
> I tried to use ORC, but i can't populate it with a load data  (it
> return an error of file format).
>
> Using an "ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC" after
> populating the table, i have a file format error on select.
>
> --
>
> @Edward :
>
> I've tried to look around on how i can change the thrift heap size but
> haven't found anything.
> Same thing for my client (haven't found how to change the heap size)
>
> My usecase is really to have the most possible columns.
>
>
> Thanks a lot for your help
>
>
> Regards
>
> David
>
>
>
>
>
> On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo <
> edlinuxg...@gmail.com> wrote:
>
>> Ok here are the problem(s). Thrift has frame size limits, thrift has
>> to buffer rows into memory.
>>
>> Hove thrift has a heap size, it needs to big in this case.
>>
>> Your client needs a big heap size as well.
>>
>> The way to do this query if it is possible may be turning row
>> lateral, potwntially by treating it as a list, it will make queries on it
>> awkward.
>>
>> Good luck
>>
>>
>> On Thursday, January 30, 2014, Stephen Sprague 
>> wrote:
>> > oh. thinking some more about this i forgot to

Re: Hbase + Hive scan performance

2014-02-10 Thread Navis류승우

HBase storage handler uses it's own InputFormat.
So, hbase.client.scanner.caching (which is used in hbase.TableInputFormat)
does not work. It might be configurable via HIVE-2906, something like
"select empno, ename from hbase_emp ('hbase.scan.cache'='1000')". But I've
not tried.

bq. Is there any change in the hive (0.9) do the same as..
It might not be.

bq. why we have this Hive Jira to fix the Hbase scan cache and marked ONLY
fixed in Hive 0.12..
Sorry for that. Hive is yet in rapidly evolving state, so generally
maintenance versions are not provided.

bq. hive setting can do the same as 2nd line code
It's configurable via "hbase.scan.cacheblock"

ps. I regret the name of the configuration should be identical with that of
hbase, but it's already done.

Thanks,



2014-02-11 4:22 GMT+09:00 java8964 :

> Hi,
>
> I know this has been asked before. I did google around this topic and
> tried to understand as much as possible, but I kind of got difference
> answers based on different places. So I like to ask what I have faced and
> if someone can help me again on this topic.
>
> I created one table with one column family with 20+ columns in the hive.
> It is populated around 150M records from a 20G csv file.
> What I want to check if how fast I can get for a full scan in MR job from
> the Hbase table.
>
> It is running in a 10 nodes hadoop cluster (With Hadoop 1.1.1 + Hbase
> 0.94.3 + Hive 0.9) , 8 of them as Data + Task nodes, and one is NN and
> Hbase master, and another one is running 2nd NN.
>
> 4 nodes of 8 data nodes also run Hbase region servers.
>
> I use the following code example to get row count from a MR job,
> http://hbase.apache.org/book/mapreduce.example.html
> At first, the mapper tasks run very slow, as I commented out the following
> 2 lines on purpose:
>
> scan.setCaching(1000);// 1 is the default in Scan, which will be bad 
> for MapReduce jobs
> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>
>
> Then I added the above 2 lines, I almost get 10X faster compared to the
> first run. That's good, it proved to me that above 2 lines are important
> for Hbase full scan.
>
> Now the question comes to in Hive.
>
> I already created the table in the Hive linking to the Hbase table, then I
> started my hive session like this:
>
> hive --auxpath
> $HIVE_HOME/lib/hive-hbase-handler-0.9.0.jar,$HIVE_HOME/lib/hbase-0.94.3.jar,$HIVE_HOME/lib/zookeeper-3.4.5.jar,$HIVE_HOME/lib/guava-r09.jar
> -hiveconf hbase.master=Hbase_master:port
>
> If I run this query "select count(*) from table", I can see the mappers
> performance is very bad, almost as bad as my 1st run above.
>
> I searched this mailing list, it looks like there is a setting in Hive
> session to change the scan caching size, same as 1st line of above code
> base, from here:
>
>
> http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCAGpTDNfn11jZAJ2mfboEqkfudXaU9HGsY4b=2x1spwf4qmu...@mail.gmail.com%3E
>
> So I add the following settings in my hive session:
>
> set hbase.client.scanner.caching=1000;
>
> To my surprise, after this setting in hive session, the new MR job
> generated from the Hive query still very slow, same as before this settings.
>
> Here is what I found so far:
>
> 1) In my owner MR code, before I add the 2 lines of code change or after,
> in the job.xml of MR job, I both saw this setting in the job.xml:
> hbase.client.scanner.caching=1
> So this setting is the same in both run, but the performance improved
> great after the code change.
>
> 2) In hive run, I saw the setting "hbase.client.scanner.caching" changed
> from 1 to 1000 in job.xml, which is what I set in the hive session, but
> performance has not too much change. So the setting was changed, but it
> didn't help the performance as I expected.
>
> My questions are following:
>
> 1) Is there any change in the hive (0.9) do the same as the 1st line of
> code change? From google and hbase document, it looks like the above
> configuration is the one, but it didn't help me.
> 2) Even assume the above setting is correct, why we have this Hive Jira to
> fix the Hbase scan cache and marked ONLY fixed in Hive 0.12? The Jira
> ticket is here: https://issues.apache.org/jira/browse/HIVE-3603
> 3) Is there any hive setting can do the same as 2nd line code change
> above? If so, what is it? I google around and cannot find one.
>
> Thanks
>
> Yong
>

Re: External table reference subDirectories

2014-02-05 Thread Navis류승우

It's supposed to be implemented after
https://issues.apache.org/jira/browse/HIVE-1662 would be checked in. But
it's not in progress for a year.

Thanks,


2014-02-06 John Meza :

> A couple of simple questions on logfile organization in HDFS and
> referenced by an external table.
>
> 1.An external table location must sit on top of the HDFS directory
> containing the logfiles.
> It can not sit 2 HDFS directory levels above the logfiles and reference.
> Ex. logfiles organized by
>   /logfile/year/month/day
>   /logfile/2013/05/20/2013-05-20-1-nn.tsv
>   /logfile/2013/05/20/2013-05-20-2-nn.tsv
>   
> The external table location can't be '/logfile/2013/05';
>
> 2. is #1 related to https://issues.apache.org/jira/browse/HIVE-1083?
> thanks
> John
>

Re: Map-side join memory limit is too low

2014-02-02 Thread Navis류승우

try "set hive.mapred.local.mem=7000" or add it to hive-site.xml instead of
modifying hive-env.sh

HADOOP_HEAPSIZE is not in use. Should fix documentation of it.

Thanks,
Navis


2014-01-31 Avrilia Floratou :

> Hi,
> I'm running hive 0.12 on yarn and I'm trying to convert a common join into
> a map join. My map join fails
> and from the logs I can see that the memory limit is very low:
>
>  Starting to launch local task to process map join;  maximum memory =
> 514523136
>
> How can I increase the maximum memory?
> I've set the HADOOP_HEAP_SIZE at 7GB in hadoop-env.sh and hive-env.sh but
> that didn't help.
> Also the nodemanager runs with 7GB heap size.
>
> Is there anything else I can do to increase this value?
>
> Thanks,
> Avrilia
>

Re: Performance problem with HBase

2014-02-02 Thread Navis류승우

1. current implementation of hbase handler cannot pushdown filter with
'like' expression. You might rewrite the query some thing like "key >=
0010_0 AND key <= 0010_9"
2. Each of all tasks seemed to be scanning whole table (ie. 1000+ time),
which is fixed in HIVE-3420(not in released version). But I cannot sure of
it.

Thanks,

Navis


2014-01-30 Бородин Владимир :

> Hi all!
>
> I'm having a performance problem with quering data from hbase using hive.
> I use CDH 4.5 (hbase-0.94.6, hive-0.10.0 and hadoop-yarn-2.0.0) on a
> cluster of 10 hosts. Right now it stores 3 TB of data in hbase table which
> now consists of 1000+ regions. One record in it looks like this:
>
> hbase(main):002:0> get 'users_history',
> '0010_18446742684488356353'
> COLUMN   CELL
>
>
>  cf:affected timestamp=1389221195263,
> value=1
>
>  cf:date timestamp=1389221195263,
> value=1389221195262
>
>  cf:hidden   timestamp=1389221195263,
> value=0
>
>  cf:ip   timestamp=1389221195263,
> value=95.47.182.98
>
>  cf:module   timestamp=1389221195263,
> value=wmi
>
>  cf:operationtimestamp=1389221195263,
> value=create
>
>  cf:statetimestamp=1389221195263,
> value=206003962075906->206003962075906
>
>  cf:target   timestamp=1389221195263,
> value=message
>
> 8 row(s) in 0.0200 seconds
>
> hbase(main):003:0>
>
>
> I have created the appropriate table in hive like that:
>
> create external table hbase_users_history(key string, affected int,
> cf_date string, hidden int, ip string, module string, operation string,
> state string, target string)
> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> with serdeproperties ("hbase.columns.mapping" =
> ":key,cf:affected,cf:date,cf:hidden,cf:ip,cf:module,cf:operation,cf:state,cf:target")
> tblproperties("hbase.table.name" = "users_history");
>
>
> If I run query getting the data by full key it creates one mapper and runs
> very fast:
>
> hive> select * from hbase_users_history where
> key='0010_18446742684488356353';
>
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1386711334339_0015, Tracking URL =
> http://historydb07d.mail.yandex.net:8088/proxy/application_1386711334339_0015/
> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1386711334339_0015
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 0
> 2014-01-14 09:23:00,392 Stage-1 map = 0%,  reduce = 0%
> 2014-01-14 09:23:06,752 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 3.23 sec
> MapReduce Total cumulative CPU time: 3 seconds 230 msec
> Ended Job = job_1386711334339_0015
> MapReduce Jobs Launched:
> Job 0: Map: 1   Cumulative CPU: 3.23 sec   HDFS Read: 354 HDFS Write: 122
> SUCCESS
> Total MapReduce CPU Time Spent: 3 seconds 230 msec
> OK
> 0010_18446742684488356353   1   NULL0
> 95.47.182.98wmi create  206003962075906->206003962075906
>  message
> Time taken: 21.265 seconds
> hive>
>
>
> If I run query getting data by first part of the key it creates 1008 maps
> and takes some king of 8-10 hours which seems to be very slow:
>
> hive> select * from hbase_users_history where key like 
> '0010_%';
> <...>
> 2014-01-10 18:40:55,485 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 
> 732857.15 sec
> 2014-01-10 18:40:56,519 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 
> 732858.27 sec
> MapReduce Total cumulative CPU time: 8 days 11 hours 34 minutes 18 seconds 
> 270 msec
> Ended Job = job_1386711334339_0004
> MapReduce Jobs Launched:
> Job 0: Map: 1008   Cumulative CPU: 732858.27 sec   HDFS Read: 301742 HDFS 
> Write: 18395 SUCCESS
> Total MapReduce CPU Time Spent: 8 days 11 hours 34 minutes 18 seconds 270 msec
> OK
>
> 
>
> Time taken: 34505.449 seconds
> hive>
>
> The result is the same if I do "where key between ...".
>
>
> Explain on this query looks like that:
>
> > explain select * from hbase_users_history where key like
> '0010_%';
> OK
> ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME hbase_users_history)))
> (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT
> (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (like (TOK_TABLE_OR_COL key)
> '0010_%'
>
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
>
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> hbase_users_history
>   TableScan
> alias: hbase_users_history
> Filter Operator
>   predicate:
>   expr: (key

Re: DESCRIBE EXTENDED show numRows=0

2014-02-02 Thread Navis류승우

Could you check the task log?

For stat gathering, hive uses derby by default. If the jdbc driver for
derby is not in auxlib, task cannot publish stats.


2014-01-30 Stephen Sprague :

> the answer to this would seemingly be no.  i just tried it in hive v0.12.
>
> numRows=0 before and numRows=0 after my running of "analyze table 
> compute statistics"
>
> other values are populated though just not numRows. I wonder why that is.
>
> Cheers,
> Stephen
>
> {noformat}
>
> parameters:{numPartitions=0, numFiles=420, last_modified_by=dwr,
> last_modified_time=1390986197, transient_lastDdlTime=1391060001,
> totalSize=10748060517, *numRows=0,* rawDataSize=0},
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
>
> {noformat}
>
>
> On Mon, Jan 27, 2014 at 3:52 AM, Lefty Leverenz 
> wrote:
>
>> Can the ANALYZE statement be used to gather statistics if
>> hive.stats.autogather was 'false' when the data was loaded?  (See the
>> wiki's Statistics in Hive doc:  Existing 
>> Tables<https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables>
>> .)
>>
>> -- Lefty
>>
>>
>> On Sun, Jan 26, 2014 at 8:01 PM, Navis류승우  wrote:
>>
>>> If the data is loaded into table with insert clause with
>>> hive.stats.autogather=true, it will be updated. If it's not, it's zero (or
>>> -1 sometimes).
>>>
>>>
>>> 2014-01-23 Haroon Muhammad 
>>>
>>> Hi,
>>>>
>>>> I have observed that DESCRIBE EXTENDED always shows number of rows to
>>>> be zero despite the fact that the table has data. Is it a bug? Is it known
>>>> ? Has anyone else also come across the same ?
>>>>
>>>> Thanks,
>>>>
>>>
>>>
>>
>

Re: Using Hive metastore as general purpose RDBMS

2014-01-26 Thread Navis류승우

I've heard similar use cases from NCSoft(A big game company in Korea)
platform team but they might use their own StorageHandler
(and HiveStoragePredicateHandler) implementation.

We might introduce new injection point for partition pruning, If you can
implement the logic via an interface similar with
HiveStoragePredicateHandler.

public interface HiveStoragePredicateHandler {
  public DecomposedPredicate decomposePredicate(
JobConf jobConf,
Deserializer deserializer,
ExprNodeDesc predicate);
}


2014-01-23 Petter von Dolwitz (Hem) 

> Hi Alan,
>
> thank you for your reply. The loose idea I had was to store one row in the
> RDBMS per Hive partition so I don't think the size will be an issue
> (expecting 3000 partitions or so). The end goal was to help to decide which
> partitions that are relevant for a query. Something like adding partition
> info to the WHERE clause behind the scenes. The way the data is structured
> we currently need to look up which partitions to use elsewhere.
>
> I'll look into ORC for sure. Currently we do not use any of the provided
> file formats but have implemented our own InputFormat that read gzip:ed
> protobufs. I suspect that we later on should investigate a possible
> performance gain coming from moving to a another file format.
>
> Petter
>
>
> 2014/1/22 Alan Gates 
>
>> HCatalog is definitely not designed for this purpose.  Could you explain
>> your use case more fully?  Is this indexing for better query planning or
>> faster file access?  If so, you might look at some of the work going on in
>> ORC, which is storing indices of its data in the format itself for these
>> purposes.  Also, how much data do you need to store?  Even index size on a
>> Hadoop scale data can quickly overwhelm MySQL or Postgres (which is what
>> most people use for their metastores) if you are keeping per row
>> information.  If you truly want to access an RDBMS as if it were an
>> external data store, you could implement a HiveStorageHandler for your
>> RDBMS.
>>
>> Alan.
>>
>> On Jan 22, 2014, at 2:02 AM, Petter von Dolwitz (Hem) <
>> petter.von.dolw...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I have a case where I would like to extend Hive to use information from
>> a regular RDBMS. To limit the complexity of the installation I thought I
>> could piggyback on the already existing metatstore.
>> >
>> > As I understand it, HCatalog is not built for this purpose. Is there
>> someone out there that has a similar usecase or have any input on how this
>> is done or if it should be avoided?
>> >
>> > The use case is to look up which partitions that contain certain data.
>> >
>> > Thanks,
>> > Petter
>> >
>> >
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: DESCRIBE EXTENDED show numRows=0

2014-01-26 Thread Navis류승우

If the data is loaded into table with insert clause with
hive.stats.autogather=true, it will be updated. If it's not, it's zero (or
-1 sometimes).


2014-01-23 Haroon Muhammad 

> Hi,
>
> I have observed that DESCRIBE EXTENDED always shows number of rows to be
> zero despite the fact that the table has data. Is it a bug? Is it known ?
> Has anyone else also come across the same ?
>
> Thanks,
>

Re: Why hiveserver2 is much slower than hiveserver1?

2014-01-18 Thread Navis류승우

HIVE-3746 modified IDL to handle nulls in more performant manner.
(hiveserver1 does not handle nulls)

Thanks.



2013/11/6 B C 

> hi,
>
> We are building ms sql cube by linkedserver connecting to hiveserver with
> Cloudera's ODBC driver.
>
> There are two test results:
> 1. hiveserver1 running on 2CPUs, 8G mem, took about 8 hours
> 2. hiveserver2 running on 4CPUs, 16 mem, took about 13 hours and 27min
> (almost failed on machine with 2CPUs, 8G mem)
>
> Although on both cases, almost all CPUs are busy when building cube.
> But I cannot understand why hiveserver2 is much slower than hiveserver1,
> because from doc, hs2 support concurrency, it should be faster than hs1,
> isn't it?
>
> Thanks.
>
> CDH4.3 on CentOS6.
>
>

Re: Pointing multiple external tables to the same location

2014-01-15 Thread Navis류승우

I thinks it's back to original problem.

What you wanted is separated scan(task) for different view. But hive does
not work like that. If two tables or views (or mix of them) has same
location, it's regarded as same table with same table description (will be
overridden by lastly visited table or view).

As I suggested the first reply, hadoop link might be helpful for figuring
out this.



2014/1/14 Petter von Dolwitz (Hem) 

> I'm using Hive 0.10 (the version bundled with CDH4.4).
>
> The explain at my end looks similar to yours. I guess my real concern is
> around the way I have implemented the filters.
>
> This is how I have done it:
> - In the constructor of my RecordReader I read the property
> hive.io.filter.expr.serialized and use the IndexPredicateAnalyzer to find
> out what parts of the filter that I can apply in my RecordReader.
> - I process only the rows that match the filter.
>
> Since the filter represented in hive.io.filter.expr.serialized only
> contains one of the filters (column1 < 100 in the example above) the rows
> matching the other filter (column1 < 30) is lost. This specific example is
> overlapping so I'm not sure if the result points out the problem (column1 <
> 30 is covered by column1 < 100). In the example at my end the filters are
> not overlapping.
>
> Is the RecordReader the correct place to implement this filter? Should it
> work or should the filter integration be done at another level? For the
> example above, what did you expect hive.io.filter.text to contain?
>
> I might add that the tables are partitioned if that makes any difference.
> I originally had filter negotiation in place in a StorageHandler but
> StorageHandler did not support partitions so I switched to implementing the
> filter directly in the RecordReader. In the RecordReader I cannot negotiate
> filter with Hive but I can apply the filter that I can handle to prune data
> early.
>
> Thank you for your support,
> Petter
>
>
>
>
>
>
>
>
>
>
> 2014/1/14 Navis류승우 
>
>> In my try, it worked (and should be).
>>
>> CREATE EXTERNAL TABLE MasterTable (
>>   column1 STRING, column2 STRING)
>>   LOCATION 'hdfs://localhost:9000/home/navis/my_location';
>>
>> CREATE VIEW IF NOT EXISTS View1 (column1, column2) AS SELECT column1,
>> column2 FROM MasterTable WHERE column1<30;
>>
>> CREATE VIEW IF NOT EXISTS View2 (column1, column2) AS SELECT column1,
>> column2 FROM MasterTable WHERE column1<100;
>>
>> SELECT View1.* FROM View1 JOIN View2 ON (View1.column1 = View2.column1);
>>
>> below is result of explain, which takes single whole scan for master
>> table and handled by two TS followed by FIL with expected predicates.
>>
>> view1:view1:mastertable
>>   TableScan
>> alias: mastertable
>> Filter Operator
>>   predicate:
>>   expr: (column1 < 30)
>>   type: boolean
>>
>> view2:view2:mastertable
>>   TableScan
>> alias: mastertable
>> Filter Operator
>>   predicate:
>>   expr: (column1 < 100)
>>   type: boolean
>>
>>   Truncated Path -> Alias:
>> hdfs://localhost:9000/home/navis/my_location
>> [view1:view1:mastertable, view2:view2:mastertable]
>>
>> Can I ask the version of hive you are using?
>>
>>
>> 2014/1/9 Petter von Dolwitz (Hem) 
>>
>> Hi Navis (and others),
>>>
>>> seems like my solution with views does not work after all. That is, it
>>> works fine as long as I do not use filter pushdown. My setup is something
>>> like below:
>>>
>>> CREATE EXTERNAL TABLE MasterTable (
>>>   column1 STRING,
>>>   column2 STRING,
>>>   column3 STRING
>>>   column4 STRING)
>>>   PARTITIONED BY (partition INT)
>>>   ROW FORMAT SERDE 'MySerde'
>>>   STORED AS INPUTFORMAT 'MyInputFormat' OUTPUTFORMAT
>>> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>>>   LOCATION 'my_location';
>>>
>>> CREATE VIEW IF NOT EXISTS View1
>>>   (column1, column2, column3, column4, partition)
>>> PARTITIONED ON (partition)
>>> AS SELECT column1, column2, column3, column4, partition
>>> FROM MasterEventTable
>>> WHERE column1='value1' AND column2='value2';
>>>
>>> CREATE VIEW IF NOT EXISTS View2
>>>   (column1, column2, column3, column4, partition)
>

Re: Pointing multiple external tables to the same location

2014-01-13 Thread Navis류승우

In my try, it worked (and should be).

CREATE EXTERNAL TABLE MasterTable (
  column1 STRING, column2 STRING)
  LOCATION 'hdfs://localhost:9000/home/navis/my_location';

CREATE VIEW IF NOT EXISTS View1 (column1, column2) AS SELECT column1,
column2 FROM MasterTable WHERE column1<30;

CREATE VIEW IF NOT EXISTS View2 (column1, column2) AS SELECT column1,
column2 FROM MasterTable WHERE column1<100;

SELECT View1.* FROM View1 JOIN View2 ON (View1.column1 = View2.column1);

below is result of explain, which takes single whole scan for master table
and handled by two TS followed by FIL with expected predicates.

view1:view1:mastertable
  TableScan
alias: mastertable
Filter Operator
  predicate:
  expr: (column1 < 30)
  type: boolean

view2:view2:mastertable
  TableScan
alias: mastertable
Filter Operator
  predicate:
  expr: (column1 < 100)
  type: boolean

  Truncated Path -> Alias:
hdfs://localhost:9000/home/navis/my_location
[view1:view1:mastertable, view2:view2:mastertable]

Can I ask the version of hive you are using?


2014/1/9 Petter von Dolwitz (Hem) 

> Hi Navis (and others),
>
> seems like my solution with views does not work after all. That is, it
> works fine as long as I do not use filter pushdown. My setup is something
> like below:
>
> CREATE EXTERNAL TABLE MasterTable (
>   column1 STRING,
>   column2 STRING,
>   column3 STRING
>   column4 STRING)
>   PARTITIONED BY (partition INT)
>   ROW FORMAT SERDE 'MySerde'
>   STORED AS INPUTFORMAT 'MyInputFormat' OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>   LOCATION 'my_location';
>
> CREATE VIEW IF NOT EXISTS View1
>   (column1, column2, column3, column4, partition)
> PARTITIONED ON (partition)
> AS SELECT column1, column2, column3, column4, partition
> FROM MasterEventTable
> WHERE column1='value1' AND column2='value2';
>
> CREATE VIEW IF NOT EXISTS View2
>   (column1, column2, column3, column4, partition)
> PARTITIONED ON (partition)
> AS SELECT column1, column2, column3, column4, partition
> FROM MasterEventTable
> WHERE column1='value3' AND column2='value4';
>
>
> The following query works fine without filter pushdown:
> SELECT View1.* FROM View1 JOIN View2 ON (View1.column3 = View2.column3);
>
> Now if I enable filter pushdown (setting hive.optimize.index.filter=true)
> and apply the filter in my record reader I do not get the correct result. I
> do not get any records back at all. It seems like only the second filter
> (column1='value3' AND column2='value4) is pushed to my record reader. The
> underlying file is only traversed once. I would have expected that I either
> got an OR expression down ((column1='value3' AND column2='value4) OR
> (column1='value1' AND column2='value2)) or that the underlying file was
> scanned twice with each separate expression.
>
> Do you have any thoughts on this?
>
> Thanks,
> Petter
>
>
>
>
>
> 2013/12/22 Petter von Dolwitz (Hem) 
>
> Hi Navis,
>>
>> thank you for sorting this out! I have tried getting around this by using
>> views towards a single master table instead in combination with UDFs
>>  instead . Seems to work so far.
>>
>> /Petter
>>
>>
>> 2013/12/18 Navis류승우 
>>
>>> Hive uses path to table(or partition) mapping internally (you can see
>>> that in MapredWork, etc.), which might caused first table overwritten by
>>> other.
>>>
>>> I didn't tried symlink on hdfs, which could be a solution.
>>>
>>>
>>>
>>> 2013/12/12 Petter von Dolwitz (Hem) 
>>>
>>> Hi,
>>>>
>>>> I have declared several external tables pointing to the same location.
>>>> The things that tells these tables apart (apart from their names) is that
>>>> they have unique properties. These properties help me choose the correct
>>>> rows from the underlying file. I use a single storage handler (accompanied
>>>> by a single InputFormat and a single Serde) . The first columns in all
>>>> tables are the same but the last (a struct) is unique and
>>>> is constructed from the Serde (with help of the serde properties). A
>>>> simplified version of the tables look like so:
>>>>
>>>> CREATE EXTERNAL TABLE Table1 (
>>>>   column1 STRING,
>>>>   column2 STRING)
>>>>   STORED BY 'MyStorageHandler'
&

Re: OOM/GC limit Error

2013-12-29 Thread Navis류승우

Could you post hive version and execution plan for the query?


2013/12/21 Martin, Nick 

>  Hi all,
>
>
>
> I have two tables:
>
>
>
> tbl1: 81m rows
>
> tbl2: 4m rows
>
>
>
> tbl1 is partitioned on one column and tbl2 has none.
>
>
>
>
> I’m attempting the following query:
>
>
>
> SELECT
>
> tbl1.col_pk,
>
> tbl2.col1,
>
> tbl2.col2,
>
> SUM(tbl1.col4),
>
> SUM(tbl1.col5),
>
> SUM(tbl1.col4+col5)
>
> FROM tbl2
>
> JOIN tbl1 ON (tbl1.col_pk=tbl2.col_pk)
>
> WHERE tbl1.partitioned_col IN ('2011','2012','2013')
>
> GROUP BY
>
> tbl1.col_pk,
>
> tbl2.col1,
>
> tbl2.col2;
>
>
>
> I get this error:
>
>
>
> OutOfMemoryError: GC overhead limit exceeded
>
>
>
> So, I followed the suggestion at the end of the error output (Currently
> hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower
> value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;') through several
> iterations, eventually getting my hive.map.aggr.hash.percentmemory setting
> down to something like .0165 and it still failed.
>
>
>
> I did some searching and found some convoluted recommendations of what to
> try next. Some mentioned upping my heap size, some mentioned re-writing my
> query, etc. I upped my Hadoop maximum Java heap size to 4096mb ,re-ran, and
> got the same results.
>
>
>
> Currently, some relevant settings are:
>
>
>
> NameNode Heap Size: 4096mb
>
> DataNode maximum Java heap size: 4096mb
>
> Hadoop maximum Java heap size: 4096mb
>
> Java Options for MapReduce tasks: 768mb
>
>
>
> I have 16 map slots and 8 reduce slots available (5 node cluster, 4 data
> and one name)
>
>
>
> Thanks in advance for the help,
>
> Nick
>

Re: Pointing multiple external tables to the same location

2013-12-17 Thread Navis류승우

Hive uses path to table(or partition) mapping internally (you can see that
in MapredWork, etc.), which might caused first table overwritten by other.

I didn't tried symlink on hdfs, which could be a solution.



2013/12/12 Petter von Dolwitz (Hem) 

> Hi,
>
> I have declared several external tables pointing to the same location. The
> things that tells these tables apart (apart from their names) is that they
> have unique properties. These properties help me choose the correct rows
> from the underlying file. I use a single storage handler (accompanied by a
> single InputFormat and a single Serde) . The first columns in all tables
> are the same but the last (a struct) is unique and
> is constructed from the Serde (with help of the serde properties). A
> simplified version of the tables look like so:
>
> CREATE EXTERNAL TABLE Table1 (
>   column1 STRING,
>   column2 STRING)
>   STORED BY 'MyStorageHandler'
>   WITH SERDEPROPERTIES ('ser.class'='MyStructSerializationClass1')
>   LOCATION 'mylocation'
>   TBLPROPERTIES('recordreader.filter'='table1_filter');
>
> CREATE EXTERNAL TABLE Table2 (
>   column1 STRING,
>   column2 STRING)
>   STORED BY 'MyStorageHandler'
>   WITH SERDEPROPERTIES ('ser.class'='MyStructSerializationClass2')
>   LOCATION 'mylocation'
>   TBLPROPERTIES('recordreader.filter'='table2_filter');
>
>
> All works well for simple select queries towards the two tables. The
> following query gives very strange results though:
>
> SELECT * FROM (
>   SELECT column1,'Table1' FROM Table1 WHERE column2 = 'myValue'
>   union all
>   SELECT column1,'Table2' FROM Table2 WHERE column2 = 'myValue'
>   ) my_union
> ORDER BY my_union.column1
>
>
> It seems like one job task is created per file stored in the table
> location. This task gets the table properties from the second table and in
> the SerDe-step later on it seems like the records gets mixed up.
>
> I would have expected that hive would need to iterated the source files
> two times using two different tasks (with the correct table properties
> passed) in order to get this to work.
>
> Anyone here that can shed some light on this scenario?
>
> Thanks,
> Petter
>
>
>
>
>
>
>

Re: Limitations in the IndexPredicateAnalyzer

2013-12-17 Thread Navis류승우

IndexPredicateAnalyzer in hive supports AND conjunction only because it's
simple. Anyone can implement one that supports conjunctions like OR, CASE,
etc. if needed. If you might provide that to hive community, it would be
really appreciated.

ps.
There is a draft patch handling OR conjuncted predicates on HBaseHandler
(thanks to Teddy).
I'll arrange that to be included into hive trunk in sometime.




2013/12/11 Petter von Dolwitz (Hem) 

> Hi,
>
> I use the HiveStoragePredicateHandler interface on my storage handler to
> be able to push down filters to my record reader for early pruning. It
> seems like the IndexPredicateAnalyzer is a bit limited on what expression
> that could be pushed down.
>
> From a comment in IndexPredicateAnalyzer:
> "We can only push down stuff which appears as part of a pure conjunction:
> reject OR, CASE, etc."
>
> As such I can push down simple AND expressions in my where clause (and
> also IN RLIKE and LIKE) but I cannot use OR.
>
> Does anybody know why this limitation exists? A guess from my side is that
> it was originally developed for HBASE integration where OR expressions
> might not translate well.  Having other data sources with custom indexes
> (or relational databases for that matter) would benefit from being able to
> handle a wider spectrum of expression.
>
> Thanks,
> Petter
>
>
>

Re: requesting access to hive confluence wiki

2013-12-10 Thread Navis류승우

Is there someone who knows how to do this?


2013/11/30 Xiao Meng 

>  Hi,
>
>
>
> I would like update/fix some contents on the performance test part.  My
> user name is xiaom.
>
>
>
> Thanks,
>
>
>
> Xiao
>

Re: hive.query.string not reflecting the current query

2013-12-03 Thread Navis류승우

Looks like a bug. I've booked this on
https://issues.apache.org/jira/browse/HIVE-5935.


2013/12/4 Adam Kawa 

> Maybe you can parse the output of EXPLAIN operator applied on your query
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
> or look for other configuration property (e.g. saying that number of map
> and reduce tasks is equal to 0, or something).
>
>
> 2013/12/3 Petter von Dolwitz (Hem) 
>
>> Yes, it seems related. I think the query string is not refreshed when
>> hive decides to run without a map reduce job. Problem is that I try to
>> interact with the query string to apply an early filter in the record
>> reader. Any other known way to detect that a map reduce job is not spawned
>> so that I can work around this issue?
>>
>> /Petter
>>
>> Den tisdagen den 3:e december 2013 skrev Adam Kawa:
>>
>> Hmmm?
>>>
>>> Maybe it is related to the fact, that a query:
>>> > select * from mytable limit 100;
>>> does not start any MapReduce job. It is starts a reading operation from
>>> HDFS (and a communication with MetaStore to know what is the schema and how
>>> to parse the data using InputFormat and SerDe).
>>>
>>> For example, If you run a query that has the same functionality (i.e. to
>>> show all content of the table by specifying all columns in SELECT)
>>> > select column1, column2, ... columnN from mytable limit 100;
>>> then a map-only job will be started and maybe (?) hive.query.string
>>> will contain this query..
>>>
>>>
>>> 2013/12/3 Petter von Dolwitz (Hem) 
>>>
 Hi,

 I use hive 0.11 with a five machine cluster. I am reading the property
 hive.query.string from a custom RecordReader (used for reading external
 tables).

 If I first invoke a query like

 select * from mytable where mycolumn='myvalue';

 I get the correct query string in this property.

 If I then invoke

 select * from mytable limit 100;

 the property hive.query.string still contains the first query. Seems
 like hive uses local mode for the second query. Don't know if it is 
 related.

 Anybody knows why the query string is not updated in the second case?

 Thanks,
 Petter

>>>
>>>
>

Re: HiveServer2

2013-11-19 Thread Navis류승우

I've booked on https://issues.apache.org/jira/browse/HIVE-5858 for the
ALTER TABLE issue, mentioned by David Morel (Thanks).


2013/11/20 David Morel 

> On 18 Nov 2013, at 21:59, Stephen Sprague wrote:
>
> > A word of warning for users of HiveServer2 - version 0.11 at least. This
> > puppy has the ability crash and/or hang your server with a memory leak.
> >
> > Apparently its not new since googling shows this discussed before and i
> see
> > reference to a workaround here:
> >
> > https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2
> >
> > Anyhoo. Consider this a Public Service Announcement. Take heed.
> >
> > Regards,
> > Stephen.
>
> When setting fs.hdfs.impl.disable.cache to false I have all my ALTER TABLE
> statements involving managed tables throw an Error 1 in Hive (nothing
> more).
> Can anyone confirm that behaviour?
>
> David
>

Re: Developing a GenericUDAF

2013-11-11 Thread Navis류승우

in handling PARTIAL1,

inputOI = (StandardListObjectInspector) parameters[0];
return ObjectInspectorFactory.getStandardListObjectInspector(inputOI);

1.
inputOI is not guaranteed to be a StandardListObjectInspector.
Use ListObjectInspector instead.

2.
ObjectInspectorFactory.getStandardListObjectInspector(inputOI)

this is list of list. What you meant to be

ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.javaLongObjectInspector)



2013/11/12 Ross Levin 

> Hello,
>
> I'm writing a generic UDAF function that closely resembles SUM() with the
> main difference being that it accepts an array datatype parameter and
> returns an array datatype.
>
> I've already done this for a GenericUDF successfully. I believe I am
> having difficulty coding the proper ObjectInspectors for my parameter &
> return objects since I am getting .ClassCastException exceptions for Long
> -> LongArray.  I am using a hybrid of the GenericUDAFSum.java sample and
> the GenericUDAFCollect sample from the Programming Hive book.
>
> My parameter is a fixed length array of longs and the return is the same
> length array of longs.  As with the SUM function, I do not need to keep the
> individual row values that I collect, I can iterate the array, SUM it to
> the container and move on to the next row.  With this in mind, I think I
> can disregard having an internalMergeOI.
>
> Any input is appreciated.
>
> Thanks,
> Ross
>
>
> Here is the exception:
> -
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to [Ljava.lang.Object;
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1137)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:199)
> ... 8 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable
> cannot be cast to [Ljava.lang.Object;
> at
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:418)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:438)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204)
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245)
> at
> org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
> ... 13 more
>
> Here is the pertinent code:
>
>@Override
> public ObjectInspector init(Mode m, ObjectInspector[] parameters)
> throws HiveException
> {
> super.init(m, parameters);
> if (m == Mode.PARTIAL1)
> {
> System.out.println("1 - init() mode: " + m + "
> parameter[0]=" + parameters[0].toString());
> inputOI = (StandardListObjectInspector) parameters[0];
> return
> ObjectInspectorFactory.getStandardListObjectInspector(inputOI);
> }
> else
> {
> System.out.println("2 - init() mode: " + m + "
> parameter[0]=" + parameters[0].toString());
> JavaLongObjectInspector doi;
> doi =
> PrimitiveObjectInspectorFactory.javaLongObjectInspector;
>
> // Set up the list object inspe

Re: Using Cluster by to improve Group by Performance

2013-10-31 Thread Navis류승우

>From perspective of RS, two query is just different in hash code of RS key.
The cost of calculating hash of col3 and col4 might be negligible, I think.

2013/11/1 KayVajj 

> Any response or pointers to understand how Cluster By in sub queries can
> affect the performance/speed of outer queries is helpful.
>
> Thanks
> Kay
>
>
> On Mon, Oct 28, 2013 at 1:17 PM, KayVajj  wrote:
>
>> Hi,
>>
>> I have a question if I could use the cluster by clause in a sub query to
>> improve the performance of a group by query in hive
>>
>> Lets I have a Table A with columns (all strings) col1..col5 and the table
>> is not "Clustered"
>>
>> now I 'm trying to run the below query
>>
>> select
>>> col1,
>>> col2,
>>> col3,
>>> col4,
>>> concat_ws(',', collect_set(col5))
>>> from A
>>> group by
>>> col1,
>>> col2,
>>> col3,
>>> col4
>>
>>
>>
>> Would the below query optimize the above query and if not what is the
>> best practice to optimize this query. Assuming only col1 & col2 are the
>> uniquely identifying columns
>>
>>
>>
>>
>> select
>>> ct.col1,
>>> ct.col2,
>>> ct.col3,
>>> ct.col4,
>>> concat_ws(',', collect_set(ct.col5))
>>> from
>>> (
>>> select
>>> col1,
>>> col2,
>>> col3,
>>> col4,
>>> col5
>>> from A
>>> cluster by col1, col2
>>> ) ct
>>> group by
>>> ct.col1,
>>> ct.col2,
>>> ct.col3,
>>> ct.col4.
>>
>>
>> Thanks for your responses.
>>
>>
>

Re: [ANNOUNCE] New Hive PMC Members - Thejas Nair and Brock Noland

2013-10-24 Thread Navis류승우

Congrats!


2013/10/25 Gunther Hagleitner 

> Congrats Thejas and Brock!
>
> Thanks,
> Gunther.
>
>
> On Thu, Oct 24, 2013 at 3:25 PM, Prasad Mujumdar  >wrote:
>
> >
> >Congratulations Thejas and Brock !
> >
> > thanks
> > Prasad
> >
> >
> >
> > On Thu, Oct 24, 2013 at 3:10 PM, Carl Steinbach  wrote:
> >
> >> I am pleased to announce that Thejas Nair and Brock Noland have been
> >> elected to the Hive Project Management Committee. Please join me in
> >> congratulating Thejas and Brock!
> >>
> >> Thanks.
> >>
> >> Carl
> >>
> >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: 回复： hive 0.11 auto convert join bug report

2013-09-15 Thread Navis류승우

Hi, sorry for late reply.

As Chun Chen said, same hashcode would make this problem vivid. But it can
be happened whenever the appearing order in JOIN expression is different
with that of parents.

Thanks.





2013/9/13 Amit Sharma 

> Hi Navis,
>
> I was trying to look at this email thread as well as the jira to
> understand the scope of this issue. Does this get triggered only in cases
> of using aliases which end up mapping to the same value upon hashing? Or
> can this be triggered under other conditions as well? What if the aliases
> are not used and the table names some how might map to similar hashcode
> values?
>
> Also is changing the alias the only workaround for this problem or is
> there any other workaround possible?
>
> Thanks,
> Amit
>
>
> On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우  wrote:
>
>> Hi,
>>
>> Hive is notorious making different result with different aliases.
>> Changing alias was a final way to avoid bug in desperate situation.
>>
>> I think the patch in the issue is ready, wish it's helpful.
>>
>> Thanks.
>>
>> 2013/8/11  :
>> > Hi Navis,
>> >
>> > My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date'
>> are
>> > the same and the code in MapJoinProcessor.java ignores the order of
>> > rowschema.
>> > I look at your patch and it's exactly the same place we are working on.
>> > Thanks for your patch.
>> >
>> > 在 2013年8月11日星期日，下午9:38，Navis류승우 写道：
>> >
>> > Hi,
>> >
>> > I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
>> > and attached patch for it.
>> >
>> > It needs full test for confirmation but you can try it.
>> >
>> > Thanks.
>> >
>> > 2013/8/11 :
>> >
>> > Hi all:
>> > when I change the table alias dim_pay_date to A, the query pass in hive
>> > 0.11(
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass
>> ):
>> >
>> > use test;
>> > create table if not exists src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limit 100;
>> > set hive.auto.convert.join.noconditionaltask.size = 200;
>> > SELECT
>> > `A`.`date`
>> > , `deal`.`dealid`
>> > FROM `orderpayment_small` `orderpayment`
>> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
>> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
>> > `orderpayment`.`dealid`
>> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
>> > `orderpayment`.`cityid`
>> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
>> > limit 5;
>> >
>> >
>> > It's quite strange and interesting now. I will keep searching for the
>> answer
>> > to this issue.
>> >
>> >
>> >
>> > 在 2013年8月9日星期五，上午3:32，wzc1...@gmail.com 写道：
>> >
>> > Hi all:
>> > I'm currently testing hive11 and encounter one bug with
>> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
>> > it(or you can reach the testcase
>> > here:
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>> >
>> > use test;
>> > create table src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limi

Re: Interesting claims that seem untrue

2013-09-12 Thread Navis류승우

It looks like counting codes from company of committer, not of the author.

Considering the huge amount of works by Ashutosh, it's not strange.


2013/9/13 Sanjay Subramanian 

>  I have not read the full blogs but in the year 2013 , IMHO , LOC is a
> very old metric that defines good software any more...
>
>   From: Edward Capriolo 
> Reply-To: "user@hive.apache.org" 
> Date: Thursday, September 12, 2013 7:19 AM
> To: "hive-u...@hadoop.apache.org" , "<
> hive-...@hadoop.apache.org>" 
> Subject: Interesting claims that seem untrue
>
>   I was reading the horton-works blog and found an interesting article.
>
> http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753
>
>  There is a very interesting graphic which attempts to demonstrate lines
> of code in the 12 release.
> http://hortonworks.com/wp-content/uploads/2013/09/hive4.png
>
>  Although I do not know how they are calculated, they are probably
> counting code generated by tests output, but besides that they are wrong.
>
>  One claim is that Cloudera contributed 4,244 lines of code.
>
>  So to debunk that claim:
>
>  In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from
> cloudera, created the ptest2 testing framework. He did all the work for
> ptest2 in hive 12, and it is clearly more then 4,244
>
>  This consists of 84 java files
> [edward@desksandra ptest2]$ find . -name "*.java" | wc -l
> 84
>  and by itself is 8001 lines of code.
> [edward@desksandra ptest2]$ find . -name "*.java" | xargs cat | wc -l
> 8001
>
>  [edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch
> 7902 HIVE-4675.patch
>
>  This is not the only feature from cloudera in hive 12.
>
>  There is also a section of the article that talks of a "ROAD MAP" for
> hive features. I did not know we (hive) had a road map. I have advocated
> switching to feature based release and having a road map before, but it was
> suggested that might limit people from itch-scratching.
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Re: [ANNOUNCE] New Hive Committer - Thejas Nair

2013-08-20 Thread Navis류승우

Congratulations!

2013/8/20 Clark Yang (杨卓荦) :
> Congrats Thejas!
>
> 在 2013年8月20日星期二，Carl Steinbach 写道：
>
>> The Apache Hive PMC has voted to make Thejas Nair a committer on the Apache
>> Hive project.
>>
>> Please join me in congratulating Thejas!
>>

Re: Bug when adding multiple partitions

2013-08-19 Thread Navis류승우

https://issues.apache.org/jira/browse/HIVE-5122

2013/8/20 Navis류승우 :
> Looks like a bug. I'll fix that.
>
> 2013/8/15 Jan Dolinár :
>> Hi everyone,
>>
>> Consider following DDL:
>>
>> CREATE TABLE partition_test
>>   (a INT)
>> PARTITIONED BY (b INT);
>>
>> ALTER TABLE partition_test ADD
>> PARTITION (b=1) location '/tmp/test1'
>> PARTITION (b=2) location '/tmp/test2';
>>
>> Now lets have a look what was created:
>>
>> DESCRIBE EXTENDED partition_test PARTITION (b=1);
>> DESCRIBE EXTENDED partition_test PARTITION (b=2);
>>
>> Both describe statements yield "location:hdfs://example.com:9000/tmp/test1",
>> which is obviously incorrect.
>>
>> This behavior *is* mentioned on wiki[1], but the article speaks specifically
>> of hive 0.7. I just tested this in versions 0.7.1 and 0.10.0, and they both
>> exhibit this bug. I wasn't even able to find a JIRA for this issue, was I
>> looking wrong? Or should a new one be created?
>>
>> Best regards,
>> Jan Dolinar
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions

Re: Bug when adding multiple partitions

2013-08-19 Thread Navis류승우

Looks like a bug. I'll fix that.

2013/8/15 Jan Dolinár :
> Hi everyone,
>
> Consider following DDL:
>
> CREATE TABLE partition_test
>   (a INT)
> PARTITIONED BY (b INT);
>
> ALTER TABLE partition_test ADD
> PARTITION (b=1) location '/tmp/test1'
> PARTITION (b=2) location '/tmp/test2';
>
> Now lets have a look what was created:
>
> DESCRIBE EXTENDED partition_test PARTITION (b=1);
> DESCRIBE EXTENDED partition_test PARTITION (b=2);
>
> Both describe statements yield "location:hdfs://example.com:9000/tmp/test1",
> which is obviously incorrect.
>
> This behavior *is* mentioned on wiki[1], but the article speaks specifically
> of hive 0.7. I just tested this in versions 0.7.1 and 0.10.0, and they both
> exhibit this bug. I wasn't even able to find a JIRA for this issue, was I
> looking wrong? Or should a new one be created?
>
> Best regards,
> Jan Dolinar
>
> [1]
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions

Re: Problem in Hive Right Outer Join

2013-08-11 Thread Navis류승우

Could you upload DDLs for those tables? Thanks.

2013/8/8 Jérôme Verdier :
> Hi,
>
> I encountered a problem with Right Outer Join in Hive.
>
> Here is where is the problem :
>
> FROM default.ca ca
>   JOIN default.kpi_magasin mtransf
>   ON  mtransf.co_societe = (CASE WHEN ca.co_societe = 1 THEN 1 ELSE
> 2 END)
>   AND mtransf.id_magasin = ca.id_magasin
>   RIGHT OUTER JOIN default.ssect_comptable a ON a.id_ssect_cpt =
> ca.id_ssect_cpt
>   JOIN default.ssect_comptable b ON a.co_ssect_cpt = b.co_ssect_cpt
> AND b.co_societe = 6
>   JOIN default.kpi_ssect_cpt s
>   ON  s.co_societe   = (CASE WHEN ca.co_societe = 1 THEN 1 ELSE 2
> END)
>   AND (CASE
> WHEN ca.co_societe = 6 THEN
>   b.id_ssect_cpt
>   ELSE s.id_ssect_cpt
>   END = ca.id_ssect_cpt)
>   AND s.niveau   = 3
>
> here is the error code :
>
> FAILED: SemanticException [Error 10017]: Line 94:16 Both left and right
> aliases encountered in JOIN 'id_ssect_cpt'
>
> I have try multiples option to resolve this error, but problem is still
> here.
>
> what is wrong  here?
>
> Thanks,
>
> --
> Jérôme
>
>
>

Re: 回复： hive 0.11 auto convert join bug report

2013-08-11 Thread Navis류승우

Hi,

Hive is notorious making different result with different aliases.
Changing alias was a final way to avoid bug in desperate situation.

I think the patch in the issue is ready, wish it's helpful.

Thanks.

2013/8/11  :
> Hi Navis,
>
> My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date' are
> the same and the code in MapJoinProcessor.java ignores the order of
> rowschema.
> I look at your patch and it's exactly the same place we are working on.
> Thanks for your patch.
>
> 在 2013年8月11日星期日，下午9:38，Navis류승우 写道：
>
> Hi,
>
> I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> and attached patch for it.
>
> It needs full test for confirmation but you can try it.
>
> Thanks.
>
> 2013/8/11 :
>
> Hi all:
> when I change the table alias dim_pay_date to A, the query pass in hive
> 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):
>
> use test;
> create table if not exists src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `A`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> It's quite strange and interesting now. I will keep searching for the answer
> to this issue.
>
>
>
> 在 2013年8月9日星期五，上午3:32，wzc1...@gmail.com 写道：
>
> Hi all:
> I'm currently testing hive11 and encounter one bug with
> hive.auto.convert.join, I construct a testcase so everyone can reproduce
> it(or you can reach the testcase
> here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `dim_pay_date`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> You should replace the path of kv1.txt by yourself. You can run the above
> query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You
> can see the explain result and the console output of the query here :
> https://gist.github.com/code6/6187569
>
> I compile the trunk code but it doesn't work with this query. I can run this
> query in hive 0.9 with hive.auto.convert.join turns on.
>
> I try to dig into this problem and I think it may be caused by the map join
> optimization. Some adjacent operators aren't match for the input/output
> tableinfo(column positions diff).
>
> I'm not able to fix this bug and I would appreciate it if someone would like
> to look into this problem.
>
> Thanks.
>
>

Re: 回复： hive 0.11 auto convert join bug report

2013-08-11 Thread Navis류승우

Hi,

I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
and attached patch for it.

It needs full test for confirmation but you can try it.

Thanks.

2013/8/11  :
> Hi all:
> when I change the table alias dim_pay_date to A, the query pass in hive
> 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):
>
> use test;
> create table if not exists src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
>  `A`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> It's quite strange and interesting now. I will keep searching for the answer
> to this issue.
>
>
>
> 在 2013年8月9日星期五，上午3:32，wzc1...@gmail.com 写道：
>
> Hi all:
> I'm currently testing hive11 and encounter one bug with
> hive.auto.convert.join, I construct a testcase so everyone can reproduce
> it(or you can reach the testcase
> here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
>  `dim_pay_date`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> You should replace the path of kv1.txt by yourself. You can run the above
> query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You
> can see the explain result and the console output of the query here :
> https://gist.github.com/code6/6187569
>
> I compile the trunk code but it doesn't work with this query. I can run this
> query in hive 0.9 with hive.auto.convert.join turns on.
>
> I try to dig into this problem and I think it may be caused by the map join
> optimization. Some adjacent operators aren't match for the input/output
> tableinfo(column positions diff).
>
> I'm not able to fix this bug and I would appreciate it if someone would like
> to look into this problem.
>
> Thanks.
>
>

Re: Wildcard support in specifying file location

2013-08-10 Thread Navis류승우

As described in HIVE-951, if it's implemented, the grammar might be
implemented something like

LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';

which is consist of directory (compliant with prev versions) and
optional file regex part.

If HIVE-1662 (file pruning by predicate on FILE_NAME VC ) will be once
committed, I'll revisit the issue.

2013/8/11 Nitin Pawar :
> What Lefty said is correct as per my understanding.
>
> By default, hive maps a table with a directory with location parameter
> But then you can alter it to point to a single file. (thats the hack to use
> a file as storage location for hive table).
>
> But it does not support regex based files as storage files for tables yet.
>
> May be someone from hive (dev + designers + architects) will be able to tell
> this if its doable in anyway.
> I just tried it. it fails miserably
>
>
>
>
> On Sun, Aug 11, 2013 at 12:30 AM, Lefty Leverenz 
> wrote:
>>
>> I don't know the answer but my guess is no, you can't use wildcards to
>> specify file locations when creating external tables.  Since nobody else has
>> answered I suggest you just try it and see what happens.
>>
>> Or google "hive location wildcard" -- that led me to a related question on
>> stackoverflow
>> (http://stackoverflow.com/questions/14864540/can-i-have-a-hive-external-table-partition-search-recursively)
>> which points to two JIRAs, neither of which is resolved:
>>
>> - HIVE-1083  allow sub-directories for an external table/partition
>> (https://issues.apache.org/jira/browse/HIVE-1083)
>>
>> - HIVE-951  Selectively include EXTERNAL TABLE source files via REGEX
>> (https://issues.apache.org/jira/browse/HIVE-951)
>>
>> If my guess is wrong and you're able to use wildcards, please let me know
>> so I can add that information to the Hive wiki.
>>
>> -- Lefty Leverenz
>>
>>
>> On Mon, Jul 22, 2013 at 7:47 AM, pandees waran  wrote:
>>>
>>> Hi,
>>>
>>> I am newbie  to Hive . While creating external tables, can we use
>>> wildcard to specify file location.
>>> i.e:
>>>
>>> STORED AS TEXTFILE LOCATION 's3://root/*/date*/'
>>>
>>> Is the above specification valid in hive 0.7.1?
>>>
>>> Thanks
>>
>>
>>
>>
>
>
>
> --
> Nitin Pawar

Re: Get arguments' names in Hive's UDF

2013-08-07 Thread Navis류승우

I've booked this on https://issues.apache.org/jira/browse/HIVE-5025.

2013/7/22 Felix.徐 :
> Hi all,
>
> Is there any api to retrieve the parameter's column name in GenericUDF?
> For example:
>
> Select UDFTEST(columnA,columnB) from test;
>
> I want to get the column names("columnA" and "columnB") in  UDFTEST's
> initialize function via ObjectInspector but I did not find any viable
> solution.

Re: Problem with the windowing function ntile (Exceptions)

2013-07-25 Thread Navis류승우

I've booked this and attached patch for it.

https://issues.apache.org/jira/browse/HIVE-4932

Could you test with that? thanks.

2013/7/25 Lars Francke :
> We're still being bitten by this problem without a workaround. Does
> anyone have an idea?
>
> Thanks,
> Lars
>
> On Wed, Jul 17, 2013 at 11:24 PM, Lars Francke  wrote:
>> Hi,
>>
>> I'm running a query like this:
>>
>> CREATE TABLE foo
>>   STORED AS ORC
>> AS
>> SELECT
>>   id,
>>   season,
>>   amount,
>>   ntile(10)
>> OVER (
>>   PARTITION BY season
>>   ORDER BY amount DESC
>> )
>> FROM bar;
>>
>> On a small enough dataset that works fine but when switching to a
>> larger sample we're seeing exceptions like this:
>>
>> "Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Reset on
>> PersistentByteBasedList not supported"
>>
>> Looking at the code (without really understanding it) we tried setting:
>> SET 
>> hive.ptf.partition.persistence='org.apache.hadoop.hive.ql.exec.PTFPersistence$PartitionedByteBasedList';
>>
>> because that List supports reset but we are seeing a
>> ClassNotFoundException so we're doing that wrong.
>>
>> Next try was setting hive.ptf.partition.persistence.memsize higher
>> which worked but first of all we don't really understand what all of
>> that stuff is doing and second of all we fear that it just might break
>> down again.
>>
>> Any hints as to what that error really means and how to deal with it
>> would be greatly appreciated.
>>
>> Thanks!
>>
>> Lars

Re: Calling same UDF multiple times in a SELECT query

2013-07-23 Thread Navis류승우

It will be called 4 times whatever you annotated on the UDF if you are
using released version of hive.

https://issues.apache.org/jira/browse/HIVE-4209 , which will be
included in 0.12.0, will make that single UDF call by caching result.

2013/7/24 Sanjay Subramanian :
> Thanks Jan
>
> I will mod my UDF and test it out
>
> I want to make sure I understand your words here
> "The obvious condition is that it must always return the identical result
> when called with same parameters."
>
> If I can make sure that a call to the web service is successful it will
> always return same output for a given set of input
>
> F(x1,y1) >will always equal -> z1
>
> that’s what u mean right ?
>
> sanjay
>
> From: Jan Dolinár 
> Reply-To: "user@hive.apache.org" 
> Date: Tuesday, July 23, 2013 12:35 PM
> To: user 
>
> Subject: Re: Calling same UDF multiple times in a SELECT query
>
> Hi,
>
> If you use annotation, Hive should be able to optimize it to single call:
>
>  @UDFType(deterministic = true)
>
> The obvious condition is that it must always return the identical result
> when called with same parameters.
>
> Little bit more on this can be found in Mark Grovers post at
> http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html.
>
> Regards,
> Jan
>
>
> On Tue, Jul 23, 2013 at 9:25 PM, Nitin Pawar 
> wrote:
>>
>> fucntion return values are not stored for repeat use of same (as per my
>> understanding)
>>
>> I know you may have already thought about other approach as
>>
>> select a , if (call <-1, -1 call) as b from (select a, fooudf(a) as call
>> from table
>>
>>
>>
>>
>> On Wed, Jul 24, 2013 at 12:42 AM, Sanjay Subramanian
>>  wrote:
>>>
>>> Hi
>>>
>>> V r using version hive-exec-0.9.0-cdh4.1.2 in production
>>>
>>> I need to check and use the output from a UDF in a query to assign values
>>> to 2 columns in a SELECT query
>>>
>>> Example
>>>
>>> SELECT
>>>  a,
>>>  IF(fooUdf(a) < -1  , -1, fooUdf(a)) as b,
>>>  IF(fooUdf(a) < -1  , fooUdf(a), 0) as c
>>> FROM
>>>  my_hive_table
>>>
>>>
>>> So will fooUdf be called 4 times ? Or once ?
>>>
>>> Why this is important is because in our case this UDF calls a web service
>>> and I don't want so many calls to the service.
>>>
>>> Thanks
>>>
>>> sanjay
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> ==
>>> This email message and any attachments are for the exclusive use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. Any unauthorized review, use, disclosure or distribution is
>>> prohibited. If you are not the intended recipient, please contact the sender
>>> by reply email and destroy all copies of the original message along with any
>>> attachments, from your computer system. If you are the intended recipient,
>>> please be advised that the content of this message is subject to access,
>>> review and disclosure by the sender's Email System Administrator.
>>
>>
>>
>>
>> --
>> Nitin Pawar
>
>
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.

Re: [ANNOUNCE] New Hive Committer - Gunther Hagleitner

2013-07-22 Thread Navis류승우

I'm a little late. Congratulations Gunther and Brock!

2013/7/21 Prasanth J :
> Congrats Gunther!
>
> Thanks
> -- Prasanth
>
> On Jul 21, 2013, at 1:00 AM, Carl Steinbach  wrote:
>
>> The Apache Hive PMC has voted to make Gunther Hagleitner a
>> committer on the Apache Hive project.
>>
>> Congratulations Gunther!
>>
>> Carl
>

Re: Strange error in hive

2013-07-15 Thread Navis류승우

patch -p0 -i  would work.

https://cwiki.apache.org/confluence/display/Hive/HowToContribute

2013/7/15 Jérôme Verdier :
> Hi,
>
> Thanks for your reply Navis.
>
> I have downloaded the patch file, but, i don't know how to install this...
>
> is there a good tutorial for this ?
>
>
>
> 2013/7/10 Navis류승우 
>>
>> Attached patch for this in https://issues.apache.org/jira/browse/HIVE-4837
>>
>> 2013/7/10 Navis류승우 :
>> > Could you try to remove "NULL as FLG_DEM_INC_PRX_CS_VAL"s in the query?
>> >
>> > It seemed not related to HIVE-4650 but still a bug (I'll book this)
>> >
>> > 2013/7/9 Jérôme Verdier :
>> >> Hi,
>> >>
>> >> Thanks for your help.
>> >>
>> >> You can see logs below :
>> >>
>> >> java.lang.RuntimeException: Error in configuring object
>> >> at
>> >>
>> >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> >> at
>> >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> >> at
>> >>
>> >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
>> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>> >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> >> at java.security.AccessController.doPrivileged(Native Method)
>> >> at javax.security.auth.Subject.doAs(Subject.java:396)
>> >> at
>> >>
>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>> >> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> >> Caused by: java.lang.reflect.InvocationTargetException
>> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >> at
>> >>
>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >> at
>> >>
>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >> at java.lang.reflect.Method.invoke(Method.java:597)
>> >> at
>> >>
>> >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> >> ... 9 more
>> >> Caused by: java.lang.RuntimeException: Error in configuring object
>> >> at
>> >>
>> >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> >> at
>> >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> >> at
>> >>
>> >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> >> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> >> ... 14 more
>> >> Caused by: java.lang.reflect.InvocationTargetException
>> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >> at
>> >>
>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >> at
>> >>
>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >> at java.lang.reflect.Method.invoke(Method.java:597)
>> >> at
>> >>
>> >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> >> ... 17 more
>> >> Caused by: java.lang.RuntimeException: Map operator initialization
>> >> failed
>> >> at
>> >>
>> >> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
>> >> ... 22 more
>> >> Caused by: java.lang.NullPointerException
>> >> at
>> >>
>> >> org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:64)
>> >> at java.lang.String.valueOf(String.java:2826)
>> >> at java.lang.StringBuilder.append(StringBuilder.java:115)
>> >> at
>> >>
>> >> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
>> >> at
>> >> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>> >> at
>> >> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
>> >>

Re: export csv, use ',' as split

2013-07-10 Thread Navis류승우

Fixed in hive-0.11.0

https://issues.apache.org/jira/browse/HIVE-3682

2013/7/11 Sanjay Subramanian :
> Hive does not have a output delimiter specifier yet (not sure if 0.11.x may
> have it)
>
> But for now please try the following
>
> hive -e myquery | sed 's/\t/,/g' >> result.csv
>
> Good luck
>
> Sanjay
>
> From: kentkong_work 
> Reply-To: "user@hive.apache.org" 
> Date: Tuesday, July 9, 2013 9:48 PM
> To: user 
> Subject: export csv, use ',' as split
>
> hi here,
>I create a table like this and put a lot data into it.
>then I export query result into csv file like this:
> hive -e myquery >> result.csv
>
>but the csv uses tab as split.
>how to make hive use ','? thanks!
>
> CREATE TABLE if not exists upload_users(
>   username string,
>   mobile string,
>   id_type string,
>   id_no string,
>   email string,
>   address string,
>   validate_time string
> ) partitioned by (fileid string)
> row format delimited fields terminated by "\,";
>
>
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.

Re: Strange error in hive

2013-07-09 Thread Navis류승우

Attached patch for this in https://issues.apache.org/jira/browse/HIVE-4837

2013/7/10 Navis류승우 :
> Could you try to remove "NULL as FLG_DEM_INC_PRX_CS_VAL"s in the query?
>
> It seemed not related to HIVE-4650 but still a bug (I'll book this)
>
> 2013/7/9 Jérôme Verdier :
>> Hi,
>>
>> Thanks for your help.
>>
>> You can see logs below :
>>
>> java.lang.RuntimeException: Error in configuring object
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> ... 9 more
>> Caused by: java.lang.RuntimeException: Error in configuring object
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> ... 14 more
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>> ... 17 more
>> Caused by: java.lang.RuntimeException: Map operator initialization failed
>> at
>> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
>> ... 22 more
>> Caused by: java.lang.NullPointerException
>> at
>> org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:64)
>> at java.lang.String.valueOf(String.java:2826)
>> at java.lang.StringBuilder.append(StringBuilder.java:115)
>> at
>> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
>> at
>> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
>> at
>> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>> at
>> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:563)
>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>> at
>> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100)
>> ... 22 more
>>
>>
>>
>> 2013/7/8 
>>
>>> Hii Jerome
>>>
>>>
>>> Can you send the error log of the MapReduce task that failed? That should
>>> have some pointers which can help you troubleshoot the issue.
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from remote device, Please excuse typos
>>> 
>>> From: Jérôme Verdier 
>>> Date: Mon, 8 Jul 2013 11:25:34 +0200
>>> To: 
>>> ReplyTo: user@hive.apache.org
>>> Subject: Strange error in hive
>>>
>>> Hi everybody,
>>>
>>> I faced a strange error in hive this morning.
>>>
>>> The error message is this one :
>>>
>>> FAILED: Execution Error, return code 2 from
>>> org.apache.hadoop.hive.ql.exec.MapRedTask
>>>
>>> after a quick search on Google, it appears that this is a Hive bug :
>>>
>>> https://issues.apache.org/jira/browse/HIVE-4650
>>>
>>> Is there a way to pass through this error ?
>>>
>>> Thanks.
>>>
>>> NB : my hive script is in the attachment.
>>>
>>>
>>> --
>>> Jérôme VERDIER
>>> 06.72.19.17.31
>>> verdier.jerom...@gmail.com
>>>
>>
>>
>>
>> --
>> Jérôme VERDIER
>> 06.72.19.17.31
>> verdier.jerom...@gmail.com
>>

Re: Strange error in hive

2013-07-09 Thread Navis류승우

Could you try to remove "NULL as FLG_DEM_INC_PRX_CS_VAL"s in the query?

It seemed not related to HIVE-4650 but still a bug (I'll book this)

2013/7/9 Jérôme Verdier :
> Hi,
>
> Thanks for your help.
>
> You can see logs below :
>
> java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 17 more
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
> ... 22 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:64)
> at java.lang.String.valueOf(String.java:2826)
> at java.lang.StringBuilder.append(StringBuilder.java:115)
> at
> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
> at
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:563)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100)
> ... 22 more
>
>
>
> 2013/7/8 
>
>> Hii Jerome
>>
>>
>> Can you send the error log of the MapReduce task that failed? That should
>> have some pointers which can help you troubleshoot the issue.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> 
>> From: Jérôme Verdier 
>> Date: Mon, 8 Jul 2013 11:25:34 +0200
>> To: 
>> ReplyTo: user@hive.apache.org
>> Subject: Strange error in hive
>>
>> Hi everybody,
>>
>> I faced a strange error in hive this morning.
>>
>> The error message is this one :
>>
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.MapRedTask
>>
>> after a quick search on Google, it appears that this is a Hive bug :
>>
>> https://issues.apache.org/jira/browse/HIVE-4650
>>
>> Is there a way to pass through this error ?
>>
>> Thanks.
>>
>> NB : my hive script is in the attachment.
>>
>>
>> --
>> Jérôme VERDIER
>> 06.72.19.17.31
>> verdier.jerom...@gmail.com
>>
>
>
>
> --
> Jérôme VERDIER
> 06.72.19.17.31
> verdier.jerom...@gmail.com
>

Re: Fetching Results from Hive Select (JDBC ResultSet.next() vs HiveClient.fetchN())

2013-07-03 Thread Navis류승우

It seemed stmt.setFetchSize(1); can be called before execution
(without casting)

2013/7/3 Christian Schneider :
> Hi, i browsed through the sources and found a way to tune the JDBC
> ResultSet.next() performance.
>
> final Connection con =
> DriverManager.getConnection("jdbc:hive2://carolin:1/default", "hive",
> "");
> final Statement stmt = con.createStatement();
> final String tableName = "bigdata";
>
> sql = "select * from " + tableName + " limit 15";
> System.out.println("Running: " + sql);
> res = stmt.executeQuery(sql);
>
> // enlarge the FetchSize (default is just 50!)
> ((HiveQueryResultSet) res).setFetchSize(1);
>
> Best Regards,
> Christian.
>
>
> 2013/6/26 Christian Schneider 
>>
>> I just test the same statement with beeline and got the same bad
>> performance.
>>
>> Any ideas?
>>
>> Best Regards,
>> Chrisitan.
>>
>>
>> 2013/6/26 Christian Schneider 
>>>
>>> Hi,
>>> currently we are using HiveSever1 with the native HiveClient interface.
>>> Our application design looks horrible because (for whatever reason) it
>>> spawns a dedicated HiveServer for every query.
>>>
>>> We thought it is a good idea to switch to HiveServer2 (because the
>>> MetaStore get used by many different applications).
>>>
>>> The JDBC setup was straight forward, but the performance is not what we
>>> assumed.
>>>
>>> If we fetch a large result set (with fetchN()  over HiveClient) we read
>>> with around 10MB/s.
>>>
>>> If I use JDBC (with resultSet.next() ) i have a throughput from 1MB/min.
>>>
>>> Any chance to speed this up (like bulk fetching)?
>>>
>>> Best Regards,
>>> Christian.
>>
>>
>

Re: Possible to specify reducers for each stage?

2013-07-02 Thread Navis류승우

Currently it's not. https://issues.apache.org/jira/browse/HIVE-3946

2013/7/3 Felix.徐 :
> Hi all,
>
> Is it possible to specify reducer number for each stage ? how?
>
> thanks!

Re: Override COUNT() function

2013-07-02 Thread Navis류승우

As you expected, there is no documentation on it (like other optimizers)

Javadoc of the class might be helpful but seemed not in detail enough.

2013/7/2 Peter Marron :
> Thanks Navis,
>
> This is a very interesting class which I feel pretty sure that I would never 
> have found.
> Are  there any descriptions, motivations, documentation or examples anywhere?
> I suspect that there's nothing other than the source itself, but I had to ask.
>
> Regards,
>
> Z
> -Original Message-
> From: Navis류승우 [mailto:navis@nexr.com]
> Sent: 02 July 2013 08:50
> To: user@hive.apache.org
> Subject: Re: Override COUNT() function
>
> MetadataOnlyOptimizer changes GBY on partition columns to simple TableScan 
> with one line dummy.
>
> I think similar things can be done with stats.
>
> 2013/6/28 Peter Marron :
>> Hi,
>>
>>
>>
>> I feel sure that someone has asked for this before, but here goes…
>>
>>
>>
>> In the case where I have the query
>>
>>
>>
>> SELECT COUNT(*) FROM table;
>>
>>
>>
>> There are many cases where I can determine the count immediately.
>>
>> (For example if I have run something like:
>>
>>
>>
>> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2],
>> ...)] COMPUTE STATISTICS [noscan];
>>
>>
>>
>> then there seems to be a table property “numRows” which holds a count
>> of the number of rows.
>>
>> Now I know that the COUNT function can’t always be determined easily.
>>
>> If the query is more complicated, like
>>
>>
>>
>> SELECT COUNT(*) FROM table GROUP BY column;
>>
>>
>>
>> then obviously a simple scalar count is of no real use. But is there
>> some way
>>
>> to intercept the simple case and avoid running a table scan?
>>
>>
>>
>> One problem that I see is that the COUNT function is a UDAF and I am
>>
>> assuming that the presence of any aggregate function like this is
>> enough
>>
>> to force the query planner to require a Map/Reduce. Is there anyway
>>
>> to make the function look like a simple UDF for some queries? Or
>>
>> just for some tables? I guess that I’d be prepared to sacrifice the
>> full
>>
>> generality of the normal COUNT function for one which
>>
>> only functions correctly for the simple query on my tables.
>>
>>
>>
>> So is it possible to have a different COUNT function only on certain tables?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Z
>>
>>
>

Re: Override COUNT() function

2013-07-02 Thread Navis류승우

MetadataOnlyOptimizer changes GBY on partition columns to simple
TableScan with one line dummy.

I think similar things can be done with stats.

2013/6/28 Peter Marron :
> Hi,
>
>
>
> I feel sure that someone has asked for this before, but here goes…
>
>
>
> In the case where I have the query
>
>
>
> SELECT COUNT(*) FROM table;
>
>
>
> There are many cases where I can determine the count immediately.
>
> (For example if I have run something like:
>
>
>
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)]
> COMPUTE STATISTICS [noscan];
>
>
>
> then there seems to be a table property “numRows” which holds a count of the
> number of rows.
>
> Now I know that the COUNT function can’t always be determined easily.
>
> If the query is more complicated, like
>
>
>
> SELECT COUNT(*) FROM table GROUP BY column;
>
>
>
> then obviously a simple scalar count is of no real use. But is there some
> way
>
> to intercept the simple case and avoid running a table scan?
>
>
>
> One problem that I see is that the COUNT function is a UDAF and I am
>
> assuming that the presence of any aggregate function like this is enough
>
> to force the query planner to require a Map/Reduce. Is there anyway
>
> to make the function look like a simple UDF for some queries? Or
>
> just for some tables? I guess that I’d be prepared to sacrifice the full
>
> generality of the normal COUNT function for one which
>
> only functions correctly for the simple query on my tables.
>
>
>
> So is it possible to have a different COUNT function only on certain tables?
>
>
>
> Regards,
>
>
>
> Z
>
>

Re: 回复： different outer join plan between hive 0.9 and hive 0.10

2013-07-01 Thread Navis류승우

Yes, a little bit.

IMHO, these flags could be assigned only for aliases with condition on
'on' clause. Then, I think, even a byte (8 flags) could be enough in
most cases.

I'll do that if time permits.

2013/7/1 wzc1989 :
> hi navis:
> look at the patches in (HIVE-3411, HIVE-4206, HIVE-4212, HIVE-3464),  I
> understand what you mean by "hive tags rows a filter mask as a short for
> outer join, which can contain 16 flags. " . I wonder why not choose Long or
> int which can contain 64/32 tags. Does adding one Long/int in every row cost
> too much?
>
> --
> wzc1989
> 已使用 Sparrow
>
> 在 2013年5月14日星期二，下午2:17，Navis류승우 写道：
>
> In short, hive tags rows a filter mask as a short for outer join,
> which can contain 16 flags. (see HIVE-3411, plz)
>
> I'll survey for a solution.
>
> 2013/5/14 wzc1989 :
>
> "hive cannot merge joins of 16+ aliases with outer join into single stage."
> In our use case we use one table full outer join all other table to produce
> one big table, which may exceed 16 outer join limits and will be split into
> multi stage under hive 0.10.
> It become very slow under hive 0.10 while we run such query well under hive
> 0.9.
> I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot
> merge join 16+ aliases into single stage while hive 0.9 doesn't have such
> issue. could you explain this or give me some hint?
>
> Thanks!
>
> --
> wzc1989
> 已使用 Sparrow
>
> 在 2013年5月14日星期二，下午12:26，Navis류승우 写道：
>
> The error message means hive cannot merge joins of 16+ aliases with
> outer join into single stage. It was 8 way originally (HIVE-3411) but
> expanded to 16 later.
>
> Check https://issues.apache.org/jira/browse/HIVE-3411 for details.
>
> 2013/5/14 wzc1989 :
>
> This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
> commits and the above explain result matches in hive 0.9 and hive 0.10,
> thanks!
> But I confuse about this error msg:
>
> JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer
> join(s) " +
> "cannot have more than 16 aliases"),
>
> does this mean in hive0.10 when we have more than 16 outer join the query
> plan will still have some bug?
> I test the sql below and find the explain result still diff between hive 0.9
> and hive 0.10.
>
> explain select
> sum(a.value) val
> from default.test_join a
> left outer join default.test_join b on a.key = b.key
> left outer join default.test_join c on a.key = c.key
> left outer join default.test_join d on a.key = d.key
> left outer join default.test_join e on a.key = e.key
> left outer join default.test_join f on a.key = f.key
> left outer join default.test_join g on a.key = g.key
> left outer join default.test_join h on a.key = h.key
> left outer join default.test_join i on a.key = i.key
> left outer join default.test_join j on a.key = j.key
> left outer join default.test_join k on a.key = k.key
> left outer join default.test_join l on a.key = l.key
> left outer join default.test_join m on a.key = m.key
> left outer join default.test_join n on a.key = n.key
> left outer join default.test_join u on a.key = u.key
> left outer join default.test_join v on a.key = v.key
> left outer join default.test_join w on a.key = w.key
> left outer join default.test_join x on a.key = x.key
> left outer join default.test_join z on a.key = z.key
>
>
> --
> wzc1989
> 已使用 Sparrow
>
> 在 2013年3月29日星期五，上午9:34，Navis류승우 写道：
>
> The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
> HIVE-3464) and still not completely fixed even in trunk.
>
> Will be fixed shortly.
>
> 2013/3/29 wzc :
>
> The bug remains even if I apply the patch in HIVE-4206 :( The explain
> result hasn't change.
>
>
> 2013/3/28 Navis류승우 
>
>
> It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).
>
> Thanks for reporting it.
>
> 2013/3/24 wzc :
>
> Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
> of
> our hive queries almost 7 times slow. One of such query consists
> multiple
> table outer join on the same key. By looking into the query, we found
> the
> query plans generate by hive 0.9 and hive 0.10 are different. Here is
> the
> example:
>
> testcase:
>
> use default;
> create table test_join (
> `key` string,
> `value` string
> );
>
> explain select
> sum(a.value) val
> from default.test_join a
> left outer join default.test_join b on a.key = b.key
> left outer join default.test_join c on a.key = c.key
> left outer join default.test_join d on a.key = d.key
> left outer join default.test_join e on a.key = e.key
> left outer jo

Re: Use of virtual columns in joins

2013-06-26 Thread Navis류승우

Yes, it's a bug. I've booked on https://issues.apache.org/jira/browse/HIVE-4790.

2013/6/25 Peter Marron :
> Hi,
>
>
>
> Sorry for the delay but I finally got around to testing these queries with
> Hive version 11.
>
> Things are improved. Two of the three queries now run fine. However one
> query still fails.
>
> So this query runs fine:
>
>
>
> SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON
> b.rownumber = a.number;
>
> But this one (which is _very_ similar)
>
>
>
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON
> b.rownumber = a.number;
>
> fails with this error:
>
>
>
> > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber =
> a.number;
>
> Automatically selecting local only mode for query
>
> Total MapReduce jobs = 1
>
> setting HADOOP_USER_NAMEpmarron
>
> 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property
> hive.metastore.local no longer has any effect. Make sure to provide a valid
> value for hive.metastore.uris if you are connecting to a remote metastore.
>
> Execution log at: /tmp/pmarron/.log
>
> 2013-06-25 10:52:56 Starting to launch local task to process map join;
> maximum memory = 932118528
>
> java.lang.RuntimeException: cannot find field block__offset__inside__file
> from [0:rownumber, 1:offset]
>
> at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
>
> at
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
>
> at
> org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
>
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
>
> at
> org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
>
> at
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
>
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>
> at
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
>
> at
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
>
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Execution failed with exit status: 2
>
> Obtaining error information
>
>
>
> Task failed!
>
> Task ID:
>
>   Stage-4
>
>
>
> Logs:
>
>
>
> /tmp/pmarron/hive.log
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapredLocalTask
>
>
>
> There really doesn’t seem to be anything helpful in the logs either.
>
> It seems a little weird that it can find the virtual column in the first
> table, but not the second.
>
> Again, these are not blocking me. I’m just reporting these results as they
> may expose a bug.
>
>
>
> Regards,
>
>
>
> Z
>
>
>
> From: Ashutosh Chauhan [mailto:hashut...@apache.org]
> Sent: 10 June 2013 16:48
> To: user@hive.apache.org
> Subject: Re: Use of virtual columns in joins
>
>
>
> You might be hitting into https://issues.apache.org/jira/browse/HIVE-4033 in
> which case its recommended that you upgrade to 0.11 where in this bug is
> fixed.
>
>
>
> On Mon, Jun 10, 2013 at 1:57 AM, Peter Marron
>  wrote:
>
> Hi,
>
>
>
> I’m using hive 0.10.0 over hadoop 1.0.4.
>
>
>
> I have created a couple of test tables and found that  various join queries
>
> that refer to virtual columns fail. For example the query:
>
>
>
> SELECT * FROM a JOIN b ON b.rownumber = a.number;
>
>
>
> works but the following three queries all fail.
>
>
>
> SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber =
> a.number;
>
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber =
> a.number;
>
> SELECT * FROM a JOIN b ON b.offset = a.BLOCK__OFFSET__INSIDE__FILE;
>
>
>
> They all fail in the same way, but I am too much of a newb to be able to
>
> tell much from the error message:
>
>
>
> Error during job, obtaining debugging in

Re: How to terminate a running HIve Query? (Executed with JDBC, Hive Server 2)

2013-06-25 Thread Navis류승우

We uses https://issues.apache.org/jira/browse/HIVE-3235 and kill jobs if
needed.

2013/6/26 Stephen Sprague 

> all it is is a comment on the line above the first statement - and that'll
> show up in the jobtracker.  Just as he shows in his example.
>
>
> On Tue, Jun 25, 2013 at 11:05 AM, Robin Verlangen  wrote:
>
>> Hi Christian,
>>
>> Sounds like a work around, but how do you prefix the job with a certain
>> name? Is that possible with a hive query statement?
>>
>> Best regards,
>>
>> Robin Verlangen
>> *Data Architect*
>> *
>> *
>> W http://www.robinverlangen.nl
>> E ro...@us2.nl
>>
>> 
>> *What is CloudPelican? *
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>
>> On Tue, Jun 25, 2013 at 7:49 PM, Christian Schneider <
>> cschneiderpub...@gmail.com> wrote:
>>
>>> Hi Stephen, thanks for the anser.
>>>
>>> Identifying to the JobId is not that easy. I also tought about this.
>>> Our application adds now a unique prefix to all queries. With this we
>>> can identify the job.
>>>
>>> Smht. like this:
>>>
>>> -- UUID: 3242-414-124-14...
>>> SELECT * FROM foobar;
>>>
>>> Now, our application can filter by Job Names starting with: -- UUID:
>>> 3242-414-124-14... to kill the query.
>>> But i think this is more a workaround then a reliable, or?
>>>
>>> Best Regards,
>>> Christian.
>>>
>>>
>>> 2013/6/25 Stephen Sprague 
>>>
 Well... if the query created a MR job on your cluster then there's
 always:

 1. use jobtracker to find your job id.
 2. use hadoop job -kill   to nuke it.

 you're saying there's no way to interrupt/kill the query from the
 client?  That very well may be the case.


 On Tue, Jun 25, 2013 at 10:22 AM, Christian Schneider <
 cschneiderpub...@gmail.com> wrote:

> I figured out that there are two implementations of the Hive JDBC
> driver in the hive-jdbc-0.10-cdh4.2.0 jar.
>
> 1. org.apache.hadoop.hive.jdbc.HiveStatement
> 2. org.apache.hive.jdbc.HiveStatement
>
> The 1. implements .close() and .cancel() but it will not delete the
> running jobs on the cluster anyway.
>
> Any suggestions?
>
>
> 2013/6/25 Christian Schneider 
>
>> Hi,
>> is it possible to kill a running query (including all the hadoop jobs
>> behind)?
>>
>> I think it's not, because the Hive JDBC Driver doesn't implement
>> .close() and .cancel() on the (prepared) statement.
>>
>> This attached code shows the problem.
>>
>> Bevor the statement gets executed, it will spawn a Thread that tries
>> to stop the execution of the query after 10 sec.
>>
>> Are there any other ways to stop the job on the cluster?
>>
>> I could do it over the Job Client, but for that i need the JobId.
>>
>> Thanks a lot.
>>
>>
>> Best Regards,
>>
>> Christian.
>>
>
>

>>>
>>
>

Re: TempStatsStore derby.log

2013-06-23 Thread Navis류승우

I think it's create by stat publisher, which uses derby as default.

It can be disabled by setting hive.stats.autogather=false;

2013/6/22 Raj Hadoop :
> Hi,
>
> I have Hive metastore created in an Oracle database.
>
> But when i execute my Hive queries , I see following directory and file
> created.
> TempStatsStore  (directory)
> derby.log
>
> What are this? Can one one suggest why derby log is created even though my
> javax.jdo.option.ConnectionURL is pointing to Oracle?
>
> Thanks,
> Raj
>

Re: Wrong values returned for nullable columns in hbase tables when accessed via hive

2013-06-16 Thread Navis류승우

Looks like it's https://issues.apache.org/jira/browse/HIVE-3179, which
is fixed in hive 0.11.0

2013/6/14 Rupinder Singh :
> Hi all,
>
>
>
> I am facing an issue when selecting a nullable column twice in a hive select
> statement against an hbase table. For rows where that column is null, 2
> different values are returned: one null (correct) and the second is the last
> non-null value.
>
> Has anyone seen this issue?
>
>
>
> The query:
>
> hive> select visitorId, memberId, dateCreated, memberId from
> hbase_page_view;
>
>
>
> The results:
>
> 003320da-01bf-4ddc-80e9-5d070389a53dNULL2013-05-21 15:19:31.781
>
> 007be1d9-a93a-4ca5-a5d3-e6047aa5cfd0NULL2013-05-21 14:18:14.623
>
> 00fe4cc7-e7a9-4351-8f9a-8165547dc4f5NULL2013-05-21 15:27:03.628
>
> 00fe4cc7-e7a9-4351-8f9a-8165547dc4f5NULL2013-05-21 15:27:34.174
>
> 00fe4cc7-e7a9-4351-8f9a-8165547dc4f578913   2013-05-21 15:28:58.714
> 78913
>
> 00fe4cc7-e7a9-4351-8f9a-8165547dc4f578913   2013-05-21 15:29:25.765
> 78913
>
> 01004f8b-8817-4866-84c7-b40e634a41d9NULL2013-05-21 14:36:29.405
> 78913
>
> 01619700-11f2-4b88-a9ef-55dc609379caNULL2013-05-21 15:56:11.157
> 78913
>
> 01a5036a-7ebc-4f35-9be7-461ef318b4c9NULL2013-05-21 14:06:18.014
> 78913
>
> 01d91d8a-bd17-44aa-9604-cf6c8ca3d4f3NULL2013-05-21 14:05:02.095
> 78913
>
> 01d91d8a-bd17-44aa-9604-cf6c8ca3d4f389464   2013-05-21 14:05:51.820
> 89464
>
> 01d91d8a-bd17-44aa-9604-cf6c8ca3d4f389464   2013-05-21 14:06:53.558
> 89464
>
> 01d91d8a-bd17-44aa-9604-cf6c8ca3d4f389464   2013-05-21 14:07:23.479
> 89464
>
> 01d91d8a-bd17-44aa-9604-cf6c8ca3d4f389464   2013-05-21 14:13:59.841
> 89464
>
> 0207b0bd-f3ca-4b9a-acf1-afc401146195NULL2013-05-21 14:16:36.733
> 89464
>
> 0207b0bd-f3ca-4b9a-acf1-afc401146195NULL2013-05-21 14:28:12.305
> 89464
>
> 0526fae9-310a-4fbc-b2e9-4ecb5a4d21e2NULL2013-05-21 14:20:08.159
> 89464
>
> 0526fae9-310a-4fbc-b2e9-4ecb5a4d21e2NULL2013-05-21 14:21:08.006
> 89464
>
> 0526fae9-310a-4fbc-b2e9-4ecb5a4d21e2NULL2013-05-21 14:21:21.178
> 89464
>
> 0526fae9-310a-4fbc-b2e9-4ecb5a4d21e289472   2013-05-21 14:22:05.391
> 89472
>
> 0526fae9-310a-4fbc-b2e9-4ecb5a4d21e289472   2013-05-21 14:24:38.619
> 89472
>
> 0526fae9-310a-4fbc-b2e9-4ecb5a4d21e289472   2013-05-21 14:31:32.279
> 89472
>
> 0559118e-559a-469e-aeff-90254b128fa6NULL2013-05-21 14:02:17.084
> 89472
>
> 0559118e-559a-469e-aeff-90254b128fa6NULL2013-05-21 14:02:31.437
> 89472
>
> 0559118e-559a-469e-aeff-90254b128fa6NULL2013-05-21 14:03:43.456
> 89472
>
> 05bc7517-ceac-40f7-81c9-6da6d1c9713bNULL2013-05-21 15:57:49.375
> 89472
>
> 0625208e-015a-46be-85b5-ed5102af3d7cNULL2013-05-21 15:29:29.004
> 89472
>
>
>
> The hbase table has a single column family. It is mapped to an external hive
> table using the standard hive idiom like so:
>
> CREATE EXTERNAL TABLE hbase_page_view(key string, visitorId string,
> dateCreated string, memberId string, blah blah)
>
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES ("hbase.columns.mapping" =
> ":key,eve:visid,eve:datec,eve:mid,.")
>
> TBLPROPERTIES ("hbase.table.name" = "page_view");
>
>
>
> thanks
>
> Rupinder
>
>
>
>
>
> This email is intended for the person(s) to whom it is addressed and may
> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
> use, distribution, copying, or disclosure by any person other than the
> addressee(s) is strictly prohibited. If you have received this email in
> error, please notify the sender immediately by return email and delete the
> message and any attachments from your system.
>
>

Re: Hive Query having virtual column INPUTFILENAME in where clause gives exception

2013-06-15 Thread Navis류승우

Firstly, the exception seemed https://issues.apache.org/jira/browse/HIVE-3926.

Secondly, file selection on vc (file-name, etc.) is
https://issues.apache.org/jira/browse/HIVE-1662

Both of them are not fixed yet.

2013/6/14 Nitin Pawar :
> Jitendra,
> I am really not sure you can use virtual columns in where clause.  (I never
> tried it so I may be wrong as well).
>
> can you try executing your query as below
>
> select count(*), filename from (select INPUT__FILE__NAME as filename from
> netflow)tmp  where filename='vzb.1351794600.0';
>
> please check for query syntax. I am giving an idea and have not verified the
> query
>
>
> On Fri, Jun 14, 2013 at 4:57 PM, Jitendra Kumar Singh
>  wrote:
>>
>> Hi Guys,
>>
>> Executing hive query with filter on virtual column INPUT_FILE_NAME result
>> in following exception.
>>
>> hive> select count(*) from netflow where
>> INPUT__FILE__NAME='vzb.1351794600.0';
>>
>> FAILED: SemanticException java.lang.RuntimeException: cannot find field
>> input__file__name from
>> [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@1d264bf5,
>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@3d44d0c6,
>>
>> .
>>
>> .
>>
>> .
>>
>>
>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@7e6bc5aa]
>>
>> This error is different from the one we get when column name is wrong
>>
>> hive> select count(*) from netflow where
>> INPUT__FILE__NAM='vzb.1351794600.0';
>>
>> FAILED: SemanticException [Error 10004]: Line 1:35 Invalid table alias or
>> column reference 'INPUT__FILE__NAM': (possible column names are: first,
>> last, )
>>
>> But using this virtual column in select clause works fine.
>>
>> hive> select INPUT__FILE__NAME from netflow group by INPUT__FILE__NAME;
>>
>> Total MapReduce jobs = 1
>>
>> Launching Job 1 out of 1
>>
>> Number of reduce tasks not specified. Estimated from input data size: 4
>>
>> In order to change the average load for a reducer (in bytes):
>>
>>   set hive.exec.reducers.bytes.per.reducer=
>>
>> In order to limit the maximum number of reducers:
>>
>>   set hive.exec.reducers.max=
>>
>> In order to set a constant number of reducers:
>>
>>   set mapred.reduce.tasks=
>>
>> Starting Job = job_201306041359_0006, Tracking URL =
>> http://192.168.0.224:50030/jobdetails.jsp?jobid=job_201306041359_0006
>>
>> Kill Command = /opt/hadoop/bin/../bin/hadoop job  -kill
>> job_201306041359_0006
>>
>> Hadoop job information for Stage-1: number of mappers: 12; number of
>> reducers: 4
>>
>> 2013-06-14 18:20:10,265 Stage-1 map = 0%,  reduce = 0%
>>
>> 2013-06-14 18:20:33,363 Stage-1 map = 8%,  reduce = 0%
>>
>> .
>>
>> .
>>
>> .
>>
>> 2013-06-14 18:21:15,554 Stage-1 map = 100%,  reduce = 100%
>>
>> Ended Job = job_201306041359_0006
>>
>> MapReduce Jobs Launched:
>>
>> Job 0: Map: 12  Reduce: 4   HDFS Read: 3107826046 HDFS Write: 55 SUCCESS
>>
>> Total MapReduce CPU Time Spent: 0 msec
>>
>> OK
>>
>> hdfs://192.168.0.224:9000/data/jk/vzb/vzb.1351794600.0
>>
>> Time taken: 78.467 seconds
>>
>> I am trying to create external hive table on already present HDFS data.
>> And I have extra files in the folder that I want to ignore. Similar to what
>> is asked and suggested in following stackflow questions how to make hive
>> take only specific files as input from hdfs folder when creating an external
>> table in hive can I point the location to specific files in a direcotry?
>>
>> Any help would be appreciated. Full stack trace I am getting is as follows
>>
>> 2013-06-14 15:01:32,608 ERROR ql.Driver
>> (SessionState.java:printError(401)) - FAILED: SemanticException
>> java.lang.RuntimeException: cannot find field input__
>>
>> org.apache.hadoop.hive.ql.parse.SemanticException:
>> java.lang.RuntimeException: cannot find field input__file__name from
>> [org.apache.hadoop.hive.serde2.object
>>
>> at
>> org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:122)
>>
>> at
>> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>>
>> at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>>
>> at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>>
>> at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>>
>> at
>> org.apache.hadoop.hive.ql.optimizer.pcr.PartitionConditionRemover.transform(PartitionConditionRemover.java:86)
>>
>> at
>> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:102)
>>
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8163)
>>
>> at
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>>
>> at
>> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:50)
>>

Re: Access context from UDF

2013-06-13 Thread Navis류승우

If VCs are not on query, it'll not be made in ExecMapper from start.

If you can do something with Reporter instance in MR,
https://issues.apache.org/jira/browse/HIVE-3628 would be helpful.

2013/5/30 Peter Marron :
> Hi,
>
>
>
> Using Hive 0.10.0 over Hadoop 1.0.4.
>
>
>
> I guess that I know that this is a long shot.
>
> Is there any way to access the context from inside a UDF?
>
> Specifically I want to get hold of the value of the virtual
>
> column BLOCK__OFFSET__INSIDE__FILE from inside a
>
> UDF that I’m implementing. Of course I can pass it in as
>
> a parameter but it would be nice if I didn’t have to.
>
> So is there some way I can interrogate the value of this column?
>
> Or, seeing as I can, that the value of the virtual column
>
> is set using the value of
>
> ctx.getIoCxt().getCurrentBlockStart();
>
> where ctx is an ExecMapperContext
>
> is there some way that I can get hold of this
>
> directly? Can the UDF get hold of the ExecMapperContext?
>
>
>
> Thanks in advance.
>
>
>
> Z
>
>
>
>

Re: Enhancing Query Join to speed up Query

2013-06-13 Thread Navis류승우

You can use "explain" for confirming differences. For inner joins, it
would make the same plan.

2013/6/14 Igor Tatarinov :
> I would expect no difference because of predicate pushdown.
>
> igor
> decide.com
>
>
> On Thu, Jun 13, 2013 at 11:31 AM, Naga Vijay  wrote:
>>
>> Sure, Will do
>>
>>
>> On Thu, Jun 13, 2013 at 10:42 AM, Stephen Sprague 
>> wrote:
>>>
>>> Hi naja,
>>> test those two versions (or three now) and report back to the group.  :)
>>> even if some smarty-pants thinks he knows the answer its always good to
>>> confirm things are as they should be.
>>>
>>>
>>> On Wed, Jun 12, 2013 at 11:54 PM, Sanjay Subramanian
>>>  wrote:

 Hi

 I would actually do it like this…so that the set on the left of JOIN
 becomes smaller

 SELECT a.item_id, a.create_dt
 FROM
  ( SELECT
 item_id, create_dt
   FROM
 A
   WHERE
item_id = 'I001'
AND
   category_name = 'C001'
   )  a
 JOIN
  b
 ON
 a.item_id = b.item_id
 ;


 From: Naga Vijay 
 Reply-To: "user@hive.apache.org" 
 Date: Wednesday, June 12, 2013 9:17 PM
 To: "user@hive.apache.org" 
 Subject: Enhancing Query Join to speed up Query

 Hi,

 Which of the two query options is better?

 SELECT a.item_id, a.create_dt
 FROM   a JOIN b
 ON (a.item_id = b.item_id)
 WHERE  a.item_id = 'I001'
 ANDa.category_name = 'C001';

 - or -

 SELECT a.item_id, a.create_dt
 FROM   a JOIN b
 ON (a.item_id = b.item_id AND a.item_id = 'I001')
 WHERE  a.category_name = 'C001';

 Thanks
 Naga

 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized review, use, disclosure or distribution is
 prohibited. If you are not the intended recipient, please contact the 
 sender
 by reply email and destroy all copies of the original message along with 
 any
 attachments, from your computer system. If you are the intended recipient,
 please be advised that the content of this message is subject to access,
 review and disclosure by the sender's Email System Administrator.
>>>
>>>
>>
>

1 2 >

1 - 100 of 131 matches

Mail list logo