from:"Noam Hasson"

Re: Hive for registration process

2015-11-04 Thread Noam Hasson

Hi Kunal,

It won't, Hive is not build for fast web access, but rather for heavy
analytic.
Working right with SQL server will give you very good result, when you
start working with millions and do reach a limit try exploring No-Sql
solutions like Cassandra, CouchBase etc.

Noam.


On Wed, Nov 4, 2015 at 9:28 AM, Kunal Gaikwad  wrote:

> Hi all,
>
> I have an e-commerce app which has a registration process(it is a 5 step
> registration process). On the submit button the data is pushed on the SQL
> db currently. I wanna know can Hive help me with 1000 or more user
> registrations at a time? If yes how can I go about it?  and how fast will
> it be?
>
> Currently there are 7 tables data to be stored in SQL. Can using Hadoop
> help with many user registration?
>
> I stated 1000 as it is a start, eventually more number of users will be
> registering. the launch of the app is through India
>
>
> Thanks and regards,
> Kunal Anil Gaikwad
> Hadoop Developer
> +91 9029648475
>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: mapjoin with left join

2015-09-20 Thread Noam Hasson

Not sure if will help you, but you can try to use the map-join hint,
basically hinting Hive to put a specific table in memory:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-PriorSupportforMAPJOIN

On Fri, Sep 11, 2015 at 11:16 PM, Sergey Shelukhin 
wrote:

> As far as I know it’s not currently supported.
> The large table will be streamed in multiple tasks with the small table in
> memory, so there’s not one place that knows for sure there was no row in
> the large table for a particular small table row in any of the locations.
> It could have no match in one task but a match in other task.
> You can try rewriting the query as inner join unioned with not in, but
> “not in” might still be slow…
> IIRC there was actually a JIRA to solve this, but no work has been done so
> far.
>
> From: Steve Howard 
> Reply-To: "user@hive.apache.org" 
> Date: Friday, September 11, 2015 at 09:48
> To: "user@hive.apache.org" 
> Subject: mapjoin with left join
>
> We would like to utilize mapjoin for the following SQL construct:
>
> select small.* from small s left join large l on s.id = l.id where l.id is
> null;
>
> We can easily fit small into RAM, but large is over 1TB according to
> optimizer stats. Unless we set
> hive.auto.convert.join.noconditionaltask.size = to at least the size of
> "large", the optimizer falls back to a common map join, which is incredibly
> slow.
>
> Given the fact it is a left join, which means we won't always have rows in
> large for each row in small, is this behavior expected? Could it be that
> reading the large table would miss the new rows in small, so the large one
> has to be the one that is probed for matches?
>
> We simply want to load the 81K rows in to RAM, then for each row in large,
> check the small hash table and if it the row in small is not in large, then
> add it to large.
>
> Again, the optimizer will use a mapjoin if we set
> hive.auto.convert.join.noconditionaltask.size = 1TB (the size of the large
> table). This is of course, not practical. The small table is only 50MB.
>
> At the link below is the entire test case with two tables, one of which
> has three rows and other has 96. We can duplicate it with tables this
> small, which leads me to believe I am missing something, or this is a bug.
>
> The link has the source code that shows each table create, as well as the
> explain with an argument for hive.auto.convert.join.noconditionaltask.size
> that is passed at the command line. The output shows a mergejoin when the
> hive.auto.convert.join.noconditionaltask.size size is less than 192 (the
> size of the larger table), and a mapjoin when
> hive.auto.convert.join.noconditionaltask.size is larger than 192 (large
> table fits).
>
> http://pastebin.com/Qg6hb8yV
>
> The business case is loading only new rows into a large fact table.  The
> new rows are the ones that are small in number.
>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: Repair table doesnt update the transient_lastDdlTime of updated partitions.

2015-08-25 Thread Noam Hasson

Hi,

Check if this helps you:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionTouch

Noam.

On Tue, Aug 25, 2015 at 6:43 PM, ravi teja raviort...@gmail.com wrote:

Sorry For the incomplete mail, sent bymistake

I am working towards a incremental solution on hive based on the
transient_lastDdlTime of the partitions.
We mostly deal with hive external tables.

The transient_lastDdlTime of a partition gets updated when the insertion
to the table happens via the insert query route, we are good there.

But the issue is, if the file level updation happens in the partition
folder, then hive doesnt update transient_lastDdlTime for that partition
and we are not able to get the changed partitions list because of this.

Unfortunately we cant change the way the hive table is being updated, its
based on the file based update to the underlying location.
When we do a file based ingestion, then we have the complete list of
partitions updated.
But this cannot be passed to the incremental system, hence our source of
truth is hive metastore's a and its transient_lastDdlTime.

Is there a way where I can update the transient_lastDdlTime in the
metastore , for the partitions changed by adding files?
I have tried to re-add the changed partition to the table, for updated
ones so that the transient_lastDdlTime will change, but its not possible
as it throws an already exists exception.

Is there any other way?
Thanks in advance.

Thanks,
Ravi

On Tue, Aug 25, 2015 at 9:02 PM, ravi teja raviort...@gmail.com wrote:

Hi,

I am working towards a incremental solution on hive based on the
transient_lastDdlTime of the partitions.
If the we in

Thanks,
Ravi

--
This e-mail, as well as any attached document, may contain material which
is confidential and privileged and may include trademark, copyright and
other intellectual property rights that are proprietary to Kenshoo Ltd,
its subsidiaries or affiliates (Kenshoo). This e-mail and its
attachments may be read, copied and used only by the addressee for the
purpose(s) for which it was disclosed herein. If you have received it in
error, please destroy the message and any attachment, and contact us
immediately. If you are not the intended recipient, be aware that any
review, reliance, disclosure, copying, distribution or use of the contents
of this message without Kenshoo's express permission is strictly prohibited.

Re: Run multiple queries simultaneously

2015-08-25 Thread Noam Hasson

I would just limit the resources given to the user on YARN.

On Tue, Aug 25, 2015 at 4:21 PM, Raajay raaja...@gmail.com wrote:

 Hello,

 I want to compare the running time of an query when run alone against the
 run time in presence of other queries.

 What is the ideal setup required to run this experiment ? Should I have
 two Hive CLI's open and issue queries simultaneously ? How to script such
 experiment in Hive ?

 Raajay


-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: Hive over JDBC disable task conversion

2015-08-25 Thread Noam Hasson

Hi Emil,

If you are referring to getting back result without running map-reduce job,
than I don't believe it's possible, Hive must run map-reduce for the Order
By part.

Noam.

On Thu, Aug 20, 2015 at 6:57 PM, Emil Berglind papasw...@gmail.com wrote:

 I’m running a Hive query over JDBC in a Java app that I wrote. I want to
 be able to turn off task conversion as I am looking to stream the data
 back. I thought I could do that by using the following JDBC URL:
 jdbc:hive2://192.168.132.128:1/default?hive.fetch.task.conversion=none.
 My SQL statement has an ORDER BY in it, but other than that it is just a
 straight up “SELECT * FROM table name”. The task conversion is still
 occurring, and that causes the job to blow up, because the table has 30+
 million rows in it. I just want to stream the data so I can take advantage
 of the fetch size and read the data in batches.


-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: Hive Concurrency support

2015-08-23 Thread Noam Hasson

If you are looking to support concurrency check this param:
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency

I believe it will allow to you run several different inserts to the same
partitions, but I don't know what kind of corruption/collisions scenarios
are possible.

On Fri, Aug 21, 2015 at 9:02 PM, Suyog Parlikar suyogparli...@gmail.com
wrote:

Thanks Elliot,

For the immediate reply.

But as per hive locking mechanism,
While inserting data to a partition hive acquires exclusive lock on that
partition and shared lock on the entire table.

How is it possible to insert data into a different partition of the same
table while having shared lock on the table which does not allow write
operation.

Please correct me if my understanding about the same is wrong.
(I am using hql inserts only for these operations)

Thanks,
Suyog
On Aug 21, 2015 7:28 PM, Elliot West tea...@gmail.com wrote:

I presume you mean into different partitions of a table at the same
time? This should be possible. It is certainly supported by the streaming
API, which is probably where you want to look if you need to insert large
volumes of data to multiple partitions concurrently. I can't see why it
would not also be possible with HQL INSERTs.

On Friday, 21 August 2015, Suyog Parlikar suyogparli...@gmail.com
wrote:

Can we insert data in different partitions of a table at a time.

Waiting for inputs .

Thanks in advance.

- suyog

Re: HIVE:1.2, Query taking huge time

2015-08-20 Thread Noam Hasson

Hi,

Have you look at counters in Hadoop side? It's possible you are dealing
with a bad join which causes multiplication of items, if you see huge
number of record input/output in map/reduce phase and keeps increasing
that's probably the case.

Another thing I would try is to divide the job into several different
smaller queries, for example start with filter only, after than join and so
on.

Noam.

On Thu, Aug 20, 2015 at 10:55 AM, Nishant Aggarwal nishant@gmail.com
wrote:

 Dear Hive Users,

 I am in process of running over a poc to one of my customer demonstrating
 the huge performance benefits of Hadoop BigData using Hive.

 Following is the problem statement i am stuck with.

 I have generate a large table with 28 columns( all are double). Table size
 on disk is 70GB (i ultimately created compressed table using ORC format to
 save disk space bringing down the table size to  1GB) with more than
 450Million records.

 In order to demonstrate a complex use case i joined this table with
 itself. Following are the queries i have used to create table and  join
 query i am using.

 *Create Table and Loading Data, Hive parameters settigs:*
 set hive.vectorized.execution.enabled = true;
 set hive.vectorized.execution.reduce.enabled = true;
 set mapred.max.split.size=1;
 set mapred.min.split.size=100;
 set hive.auto.convert.join=false;
 set hive.enforce.sorting=true;
 set hive.enforce.bucketing=true;
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 set mapreduce.reduce.input.limit=-1;
 set hive.exec.parallel = true;

 CREATE TABLE huge_numeric_table_orc2(col1 double,col2 double,col3
 double,col4 double,col5 double,col6 double,col7 double,col8 double,col9
 double,col10 double,col11 double,col12 double,col13 double,col14
 double,col15 double,col16 double,col17 double,col18 double,col19
 double,col20 double,col21 double,col22 double,col23 double,col24
 double,col25 double,col26 double,col27 double,col28 double)
 clustered by (col1) sorted by (col1) into 240 buckets
 STORED AS ORC tblproperties (orc.compress=SNAPPY);

 from huge_numeric_table insert overwrite table huge_numeric_table_orc2
 select * sort by col1;


 *JOIN QUERY:*

 select (avg(t1.col1)*avg(t1.col6))/(avg(t1.col11)*avg(t1.col16)) as AVG5
 from huge_numeric_table_orc2 t1 left outer join huge_numeric_table_orc2 t2
 on t1.col1=t2.col1 where (t1.col1)  34.11 and (t2.col1) 10.12


 *The problem is that this query gets stuck at reducers :80-85%. and goes
 in a loop and never finishes. *

 Version of Hive is 1.2.

 Please help.


 Thanks and Regards
 Nishant Aggarwal, PMP
 Cell No:- +91 99588 94305



-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: query behaviors with subquery in clause

2015-08-20 Thread Noam Hasson

I observed in other situation, when ever you run queries where you don't
specify statistics partitions, Hive doesn't pre-compute which one to take
so it will take all the table.

I would suggest implementing the max date by code in a separate query.


On Thu, Aug 20, 2015 at 12:16 PM, Nitin Pawar nitinpawar...@gmail.com
wrote:

 any help guys ?

 On Thu, Aug 13, 2015 at 2:52 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 Hi,

 right now hive does not support the equality clause in sub-queries.
 for ex:  select * from A where date = (select max(date) from B)

 It though supports IN clause
 select * from A where date in (select max(date) from B)

 in table A the table is partitioned by date column so i was hoping that
 when I apply IN clause it would look only for that partition but it is
 reading the entire table

 select * from A where date='2015-08-09' ... reads one partition
 select * from A where date in ('2015-08-09') ... reads one partitions
 select * from A where date in (select max(date) from B) ... reads all
 partitions from A

 am I missing anything error or am i doing something wrong ?

 --
 Nitin Pawar




 --
 Nitin Pawar


-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: hiveserver2 hangs

2015-08-20 Thread Noam Hasson

We had a case of retrieving a record which is bigger than the GC limit, for
example a column with Array or Map type that has 1M cells.

On Wed, Aug 19, 2015 at 9:35 PM, Sanjeev Verma sanjeev.verm...@gmail.com
wrote:

 Can somebody gives me some pointer to looked upon?

 On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma sanjeev.verm...@gmail.com
 wrote:

 Hi
 We are experiencing a strange problem with the hiveserver2, in one of the
 job it gets the GC limit exceed from mapred task and hangs even having
 enough heap available.we are not able to identify what causing this issue.
 Could anybody help me identify the issue and let me know what pointers I
 need to looked up.

 Thanks




-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: hive benchmark

2015-08-11 Thread Noam Hasson

Sure,

Even a single node with can support it, it's all a question of processing
time.


On Tue, Aug 11, 2015 at 9:31 AM, siva kumar siva165...@gmail.com wrote:

 Hi Folks,
   I need to insert 1 billion records into hive and
 here is my cluster details.

 1. 6-node Hadoop cluster cluster.
 2. 16GB RAM on each node.
 3. 2TB Hard-disk on each node.

 Is this configuration suitable for storing 1 billion records? If not, what
 is that all we need to store and read 1 billion records?

 Thanks and regards,
 siva



-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: alter table add column

2015-07-09 Thread Noam Hasson

If you want to add data to an already existing rows you'll have to insert
the data again by using insert overwrite, perhaps it's better to insert
into a new table.

On Wed, Jul 8, 2015 at 11:57 PM, Mona Meena dr@hotmail.com wrote:

 Hi,

 I have a partitioned table. Is it possible to alter this table by adding a
 new column and also update the table by inserting data into the new column?
 I know how to add a new column but no idea how to insert data into the new
 column. Any suggestions please

 BR,
 Mona


-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: Hive BeeLine

2015-07-06 Thread Noam Hasson

Just making sure, LOAD DATA LOCAL INPATH loads files from your local file
system, did you make sure the file exist on your machine?

On Mon, Jul 6, 2015 at 12:15 PM, Trainee Bingo trainee1...@gmail.com
wrote:

 Hi Users,

 I have Hive and HiveServer2 on the same machine. But, when I try to *LOAD
 DATA LOCAL INPATH* using BeeLine I get Invalid Path Error and if I do
 LOAD DATA INPATH it takes it successfully.

 Can anyone pls tell me why local inpath does not work??



 Thanks,
 Trainee.


-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Re: hive locate from s3 - query

2015-07-05 Thread Noam Hasson

You can try to create a view and use json_tuple to parse the column after
removing new line.

On Fri, Jul 3, 2015 at 5:28 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:

 You probably need to make your own serde/input format that trims the line.

 On Fri, Jul 3, 2015 at 8:15 AM, ram kumar ramkumarro...@gmail.com wrote:

 when i map the hive table to locate the s3 path,
 it throws exception for the* new line at the beginning of line*.
 Is there a solution to trim the new line at the beginning in hive?
 Or any alternatives?


 CREATE EXTERNAL TABLE work (
 time BIGINT,
 uid STRING,
 type STRING
 )
 ROW FORMAT SERDE 'com.proofpoint.hive.serde.JsonSerde'
 LOCATION 's3n://work/';



 *hive  select * from work;Failed with exception java.io
 http://java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException:
 error parsing JSON*



 Thanks




-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates (Kenshoo). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Insert overwrite - Move to trash part is extremely slow

2015-07-02 Thread Noam Hasson

Hi All,

We are running an insert overwrite query with dynamic partitioning in
hive,when reaching the part where it delete the partition:
*Moved:
'hdfs://BICluster/user/noamh/hive/p_noamh/dw_fact_marketing_daily_by_year/date_year=2014/ksname=KS1513/000141_0'
to trash at: hdfs://BICluster/user/noamh/.Trash/Current*

Each move command will take several seconds, while in tables with small
number of partitions each move will take about a second, in tables with
large number of partition it will take up to 10 seconds for each move.
It's important to note that this only happens on the move part, meaning
creating the table without the insert overwrite and with no need to delete
the partition it will run much much faster.

Using full trace on hive we examined the log and found that for each move
command Hive is querying hdfs for all the partitions in the table, in other
words for each move it will query hdfs thousands of times.
A table with 1000 partition, running insert overwrite will query the hdfs
1,000,000 times.

It seems like before each partition delete, Hive for some reason is listing
all the other partitions in the table.

Here is a sample of the log:
*2015-07-02 08:11:21,432 DEBUG [IPC Client (584199013) connection to
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020 from
noamh]: ipc.Client (Client.java:receiveRpcResponse(1065)) - IPC Client
(584199013) connection to ecprdbhdp02-namenode/10.53.210.153:8020
http://10.53.210.153:8020 from noamh got value #1939*
*2015-07-02 08:11:21,432 DEBUG [main]: ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(221)) - Call: getListing took 1ms*
*2015-07-02 08:11:21,432 TRACE [main]: ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(236)) - 1: Response -
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020:
getListing {dirList { partialListing { fileType: IS_FILE path: 000332_0
length: 2217675 permission { perm: 511 } owner: noamh group: dba
modification_time: 1435838591925 access_time: 1435838591722
block_replication: 2 blocksize: 134217728 fileId: 42896800 childrenNum: 0 }
remainingEntries: 0 }}*
*2015-07-02 08:11:21,433 TRACE [main]: ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(197)) - 1: Call -
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020:
getListing {src:
/user/noamh/hive/p_noamh/dw_fact_marketing_daily_by_year/date_year=2014/ksname=KS5058
startAfter:  needLocation: false}*
*2015-07-02 08:11:21,433 DEBUG [IPC Parameter Sending Thread #0]:
ipc.Client (Client.java:run(1008)) - IPC Client (584199013) connection to
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020 from
noamh sending #1940*
*2015-07-02 08:11:21,434 DEBUG [IPC Client (584199013) connection to
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020 from
noamh]: ipc.Client (Client.java:receiveRpcResponse(1065)) - IPC Client
(584199013) connection to ecprdbhdp02-namenode/10.53.210.153:8020
http://10.53.210.153:8020 from noamh got value #1940*
*2015-07-02 08:11:21,434 DEBUG [main]: ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(221)) - Call: getListing took 1ms*
*2015-07-02 08:11:21,434 TRACE [main]: ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(236)) - 1: Response -
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020:
getListing {dirList { partialListing { fileType: IS_FILE path: 000333_0
length: 189581 permission { perm: 511 } owner: noamh group: dba
modification_time: 1435838592633 access_time: 1435838592558
block_replication: 2 blocksize: 134217728 fileId: 42896823 childrenNum: 0 }
remainingEntries: 0 }}*
*2015-07-02 08:11:21,435 TRACE [main]: ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(197)) - 1: Call -
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020:
getListing {src:
/user/noamh/hive/p_noamh/dw_fact_marketing_daily_by_year/date_year=2014/ksname=KS5060
startAfter:  needLocation: false}*
*2015-07-02 08:11:21,435 DEBUG [IPC Parameter Sending Thread #0]:
ipc.Client (Client.java:run(1008)) - IPC Client (584199013) connection to
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020 from
noamh sending #1941*
*2015-07-02 08:11:21,436 DEBUG [IPC Client (584199013) connection to
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020 from
noamh]: ipc.Client (Client.java:receiveRpcResponse(1065)) - IPC Client
(584199013) connection to ecprdbhdp02-namenode/10.53.210.153:8020
http://10.53.210.153:8020 from noamh got value #1941*
*2015-07-02 08:11:21,436 DEBUG [main]: ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(221)) - Call: getListing took 1ms*
*2015-07-02 08:11:21,436 TRACE [main]: ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(236)) - 1: Response -
ecprdbhdp02-namenode/10.53.210.153:8020 http://10.53.210.153:8020:
getListing {dirList { partialListing { fileType: IS_FILE path: 000356_0
length: 8729 permission { perm: 511 } owner: noamh group: dba
modification_time: 1435838592168 access_time: 1435838591924
block_replication: 2

Re: Hive for registration process

Re: mapjoin with left join

Re: Repair table doesnt update the transient_lastDdlTime of updated partitions.

Re: Run multiple queries simultaneously

Re: Hive over JDBC disable task conversion

Re: Hive Concurrency support

Re: HIVE:1.2, Query taking huge time

Re: query behaviors with subquery in clause

Re: hiveserver2 hangs

Re: hive benchmark

Re: alter table add column

Re: Hive BeeLine

Re: hive locate from s3 - query

Insert overwrite - Move to trash part is extremely slow

14 matches

Site Navigation

Mail list logo

Footer information