Re: Hive compaction didn't launch

2016-07-28 Thread Eugene Koifman
I think Storm has some timeout parameter that will close the transaction
if there are no events for a certain amount of time.
How many transactions do you per transaction batch?  Perhaps making the
batches smaller will make them close sooner.

Eugene


On 7/28/16, 3:59 PM, "Alan Gates"  wrote:

>But until those transactions are closed you don¹t know that they won¹t
>write to partition B.  After they write to A they may choose to write to
>B and then commit.  The compactor can not make any assumptions about what
>sessions with open transactions will do in the future.
>
>Alan.
>
>> On Jul 28, 2016, at 09:19, Igor Kuzmenko  wrote:
>> 
>> But this minOpenTxn value isn't from from delta I want to compact.
>>minOpenTxn can point on transaction in partition A while in partition B
>>there's deltas ready for compaction. If minOpenTxn is less than txnIds
>>in partition B deltas, compaction won't happen. So open transaction in
>>partition A blocks compaction in partition B. That's seems wrong to me.
>> 
>> On Thu, Jul 28, 2016 at 7:06 PM, Alan Gates 
>>wrote:
>> Hive is doing the right thing there, as it cannot compact the deltas
>>into a base file while there are still open transactions in the delta.
>>Storm should be committing on some frequency even if it doesn¹t have
>>enough data to commit.
>> 
>> Alan.
>> 
>> > On Jul 28, 2016, at 05:36, Igor Kuzmenko  wrote:
>> >
>> > I made some research on that issue.
>> > The problem is in ValidCompactorTxnList::isTxnRangeValid method.
>> >
>> > Here's code:
>> > @Override
>> > public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
>> >   if (highWatermark < minTxnId) {
>> > return RangeResponse.NONE;
>> >   } else if (minOpenTxn < 0) {
>> > return highWatermark >= maxTxnId ? RangeResponse.ALL :
>>RangeResponse.NONE;
>> >   } else {
>> > return minOpenTxn > maxTxnId ? RangeResponse.ALL :
>>RangeResponse.NONE;
>> >   }
>> > }
>> >
>> > In my case this method returned RangeResponce.NONE for most of delta
>>files. With this value delta file doesn't include in compaction.
>> >
>> > Last 'else' bock compare minOpenTxn to maxTxnId and if maxTxnId
>>bigger return RangeResponce.NONE, thats a problem for me, because of
>>using Storm Hive Bolt. Hive Bolt gets transaction and maintain it open
>>with heartbeat until there's data to commit.
>> >
>> > So if i get transaction and maintain it open all compactions will
>>stop. Is it incorrect Hive behavior, or Storm should close transaction?
>> >
>> >
>> >
>> >
>> > On Wed, Jul 27, 2016 at 8:46 PM, Igor Kuzmenko 
>>wrote:
>> > Thanks for reply, Alan. My guess with Storm was wrong. Today I get
>>same behavior with running Storm topology.
>> > Anyway, I'd like to know, how can I check that transaction batch was
>>closed correctly?
>> >
>> > On Wed, Jul 27, 2016 at 8:09 PM, Alan Gates 
>>wrote:
>> > I don¹t know the details of how the storm application that streams
>>into Hive works, but this sounds like the transaction batches weren¹t
>>getting closed.  Compaction can¹t happen until those batches are closed.
>> Do you know how you had storm configured?  Also, you might ask
>>separately on the storm list to see if people have seen this issue
>>before.
>> >
>> > Alan.
>> >
>> > > On Jul 27, 2016, at 03:31, Igor Kuzmenko  wrote:
>> > >
>> > > One more thing. I'm using Apache Storm to stream data in Hive. And
>>when I turned off Storm topology compactions started to work properly.
>> > >
>> > > On Tue, Jul 26, 2016 at 6:28 PM, Igor Kuzmenko 
>>wrote:
>> > > I'm using Hive 1.2.1 transactional table. Inserting data in it via
>>Hive Streaming API. After some time i expect compaction to start but it
>>didn't happen:
>> > >
>> > > Here's part of log, which shows that compactor initiator thread
>>doesn't see any delta files:
>> > > 2016-07-26 18:06:52,459 INFO  [Thread-8]: compactor.Initiator
>>(Initiator.java:run(89)) - Checking to see if we should compact
>>default.data_aaa.dt=20160726
>> > > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: io.AcidUtils
>>(AcidUtils.java:getAcidState(432)) - in directory
>>hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/data_aaa/dt=2016
>>0726 base = null deltas = 0
>> > > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: compactor.Initiator
>>(Initiator.java:determineCompactionType(271)) - delta size: 0 base size:
>>0 threshold: 0.1 will major compact: false
>> > >
>> > > But in that directory there's actually 23 files:
>> > >
>> > > hadoop fs -ls /apps/hive/warehouse/data_aaa/dt=20160726
>> > > Found 23 items
>> > > -rw-r--r--   3 storm hdfs  4 2016-07-26 17:20
>>/apps/hive/warehouse/data_aaa/dt=20160726/_orc_acid_version
>> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:22
>>/apps/hive/warehouse/data_aaa/dt=20160726/delta_71741256_71741355
>> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:23
>>/apps/hive/warehouse/data_aaa/dt=20160726/delta_71762456_71762555
>> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:25
>>/apps/hive/warehouse/data_aaa/dt=20160726/delta_717

Re: Hive compaction didn't launch

2016-07-28 Thread Alan Gates
But until those transactions are closed you don’t know that they won’t write to 
partition B.  After they write to A they may choose to write to B and then 
commit.  The compactor can not make any assumptions about what sessions with 
open transactions will do in the future.

Alan.

> On Jul 28, 2016, at 09:19, Igor Kuzmenko  wrote:
> 
> But this minOpenTxn value isn't from from delta I want to compact. minOpenTxn 
> can point on transaction in partition A while in partition B there's deltas 
> ready for compaction. If minOpenTxn is less than txnIds in partition B 
> deltas, compaction won't happen. So open transaction in partition A blocks 
> compaction in partition B. That's seems wrong to me.
> 
> On Thu, Jul 28, 2016 at 7:06 PM, Alan Gates  wrote:
> Hive is doing the right thing there, as it cannot compact the deltas into a 
> base file while there are still open transactions in the delta.  Storm should 
> be committing on some frequency even if it doesn’t have enough data to commit.
> 
> Alan.
> 
> > On Jul 28, 2016, at 05:36, Igor Kuzmenko  wrote:
> >
> > I made some research on that issue.
> > The problem is in ValidCompactorTxnList::isTxnRangeValid method.
> >
> > Here's code:
> > @Override
> > public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
> >   if (highWatermark < minTxnId) {
> > return RangeResponse.NONE;
> >   } else if (minOpenTxn < 0) {
> > return highWatermark >= maxTxnId ? RangeResponse.ALL : 
> > RangeResponse.NONE;
> >   } else {
> > return minOpenTxn > maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
> >   }
> > }
> >
> > In my case this method returned RangeResponce.NONE for most of delta files. 
> > With this value delta file doesn't include in compaction.
> >
> > Last 'else' bock compare minOpenTxn to maxTxnId and if maxTxnId bigger 
> > return RangeResponce.NONE, thats a problem for me, because of using Storm 
> > Hive Bolt. Hive Bolt gets transaction and maintain it open with heartbeat 
> > until there's data to commit.
> >
> > So if i get transaction and maintain it open all compactions will stop. Is 
> > it incorrect Hive behavior, or Storm should close transaction?
> >
> >
> >
> >
> > On Wed, Jul 27, 2016 at 8:46 PM, Igor Kuzmenko  wrote:
> > Thanks for reply, Alan. My guess with Storm was wrong. Today I get same 
> > behavior with running Storm topology.
> > Anyway, I'd like to know, how can I check that transaction batch was closed 
> > correctly?
> >
> > On Wed, Jul 27, 2016 at 8:09 PM, Alan Gates  wrote:
> > I don’t know the details of how the storm application that streams into 
> > Hive works, but this sounds like the transaction batches weren’t getting 
> > closed.  Compaction can’t happen until those batches are closed.  Do you 
> > know how you had storm configured?  Also, you might ask separately on the 
> > storm list to see if people have seen this issue before.
> >
> > Alan.
> >
> > > On Jul 27, 2016, at 03:31, Igor Kuzmenko  wrote:
> > >
> > > One more thing. I'm using Apache Storm to stream data in Hive. And when I 
> > > turned off Storm topology compactions started to work properly.
> > >
> > > On Tue, Jul 26, 2016 at 6:28 PM, Igor Kuzmenko  wrote:
> > > I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive 
> > > Streaming API. After some time i expect compaction to start but it didn't 
> > > happen:
> > >
> > > Here's part of log, which shows that compactor initiator thread doesn't 
> > > see any delta files:
> > > 2016-07-26 18:06:52,459 INFO  [Thread-8]: compactor.Initiator 
> > > (Initiator.java:run(89)) - Checking to see if we should compact 
> > > default.data_aaa.dt=20160726
> > > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: io.AcidUtils 
> > > (AcidUtils.java:getAcidState(432)) - in directory 
> > > hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/data_aaa/dt=20160726
> > >  base = null deltas = 0
> > > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: compactor.Initiator 
> > > (Initiator.java:determineCompactionType(271)) - delta size: 0 base size: 
> > > 0 threshold: 0.1 will major compact: false
> > >
> > > But in that directory there's actually 23 files:
> > >
> > > hadoop fs -ls /apps/hive/warehouse/data_aaa/dt=20160726
> > > Found 23 items
> > > -rw-r--r--   3 storm hdfs  4 2016-07-26 17:20 
> > > /apps/hive/warehouse/data_aaa/dt=20160726/_orc_acid_version
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:22 
> > > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71741256_71741355
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:23 
> > > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71762456_71762555
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:25 
> > > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71787756_71787855
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:26 
> > > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71795756_71795855
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:27 
> > > /apps/hive/warehouse/data_aaa/dt=20160726/delt

Re: Hive compaction didn't launch

2016-07-28 Thread Igor Kuzmenko
But this *minOpenTxn* value isn't from from delta I want to compact.
*minOpenTxn* can point on transaction in partition *A *while in partition *B
*there's deltas ready for compaction. If *minOpenTxn* is less than txnIds
in partition *B *deltas, compaction won't happen. So open transaction in
partition *A *blocks compaction in partition *B*. That's seems wrong to me.

On Thu, Jul 28, 2016 at 7:06 PM, Alan Gates  wrote:

> Hive is doing the right thing there, as it cannot compact the deltas into
> a base file while there are still open transactions in the delta.  Storm
> should be committing on some frequency even if it doesn’t have enough data
> to commit.
>
> Alan.
>
> > On Jul 28, 2016, at 05:36, Igor Kuzmenko  wrote:
> >
> > I made some research on that issue.
> > The problem is in ValidCompactorTxnList::isTxnRangeValid method.
> >
> > Here's code:
> > @Override
> > public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
> >   if (highWatermark < minTxnId) {
> > return RangeResponse.NONE;
> >   } else if (minOpenTxn < 0) {
> > return highWatermark >= maxTxnId ? RangeResponse.ALL :
> RangeResponse.NONE;
> >   } else {
> > return minOpenTxn > maxTxnId ? RangeResponse.ALL :
> RangeResponse.NONE;
> >   }
> > }
> >
> > In my case this method returned RangeResponce.NONE for most of delta
> files. With this value delta file doesn't include in compaction.
> >
> > Last 'else' bock compare minOpenTxn to maxTxnId and if maxTxnId bigger
> return RangeResponce.NONE, thats a problem for me, because of using Storm
> Hive Bolt. Hive Bolt gets transaction and maintain it open with heartbeat
> until there's data to commit.
> >
> > So if i get transaction and maintain it open all compactions will stop.
> Is it incorrect Hive behavior, or Storm should close transaction?
> >
> >
> >
> >
> > On Wed, Jul 27, 2016 at 8:46 PM, Igor Kuzmenko 
> wrote:
> > Thanks for reply, Alan. My guess with Storm was wrong. Today I get same
> behavior with running Storm topology.
> > Anyway, I'd like to know, how can I check that transaction batch was
> closed correctly?
> >
> > On Wed, Jul 27, 2016 at 8:09 PM, Alan Gates 
> wrote:
> > I don’t know the details of how the storm application that streams into
> Hive works, but this sounds like the transaction batches weren’t getting
> closed.  Compaction can’t happen until those batches are closed.  Do you
> know how you had storm configured?  Also, you might ask separately on the
> storm list to see if people have seen this issue before.
> >
> > Alan.
> >
> > > On Jul 27, 2016, at 03:31, Igor Kuzmenko  wrote:
> > >
> > > One more thing. I'm using Apache Storm to stream data in Hive. And
> when I turned off Storm topology compactions started to work properly.
> > >
> > > On Tue, Jul 26, 2016 at 6:28 PM, Igor Kuzmenko 
> wrote:
> > > I'm using Hive 1.2.1 transactional table. Inserting data in it via
> Hive Streaming API. After some time i expect compaction to start but it
> didn't happen:
> > >
> > > Here's part of log, which shows that compactor initiator thread
> doesn't see any delta files:
> > > 2016-07-26 18:06:52,459 INFO  [Thread-8]: compactor.Initiator
> (Initiator.java:run(89)) - Checking to see if we should compact
> default.data_aaa.dt=20160726
> > > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: io.AcidUtils
> (AcidUtils.java:getAcidState(432)) - in directory hdfs://
> sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/data_aaa/dt=20160726
> base = null deltas = 0
> > > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: compactor.Initiator
> (Initiator.java:determineCompactionType(271)) - delta size: 0 base size: 0
> threshold: 0.1 will major compact: false
> > >
> > > But in that directory there's actually 23 files:
> > >
> > > hadoop fs -ls /apps/hive/warehouse/data_aaa/dt=20160726
> > > Found 23 items
> > > -rw-r--r--   3 storm hdfs  4 2016-07-26 17:20
> /apps/hive/warehouse/data_aaa/dt=20160726/_orc_acid_version
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:22
> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71741256_71741355
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:23
> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71762456_71762555
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:25
> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71787756_71787855
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:26
> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71795756_71795855
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:27
> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71804656_71804755
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:29
> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71828856_71828955
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:30
> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71846656_71846755
> > > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:32
> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71850756_71850855
> > > drwxrwxrwx   - storm hdfs  0 2016-0

Re: Hive compaction didn't launch

2016-07-28 Thread Alan Gates
Hive is doing the right thing there, as it cannot compact the deltas into a 
base file while there are still open transactions in the delta.  Storm should 
be committing on some frequency even if it doesn’t have enough data to commit.

Alan.

> On Jul 28, 2016, at 05:36, Igor Kuzmenko  wrote:
> 
> I made some research on that issue.
> The problem is in ValidCompactorTxnList::isTxnRangeValid method.
> 
> Here's code:
> @Override
> public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
>   if (highWatermark < minTxnId) {
> return RangeResponse.NONE;
>   } else if (minOpenTxn < 0) {
> return highWatermark >= maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
>   } else {
> return minOpenTxn > maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
>   }
> }
> 
> In my case this method returned RangeResponce.NONE for most of delta files. 
> With this value delta file doesn't include in compaction.
> 
> Last 'else' bock compare minOpenTxn to maxTxnId and if maxTxnId bigger return 
> RangeResponce.NONE, thats a problem for me, because of using Storm Hive Bolt. 
> Hive Bolt gets transaction and maintain it open with heartbeat until there's 
> data to commit.
> 
> So if i get transaction and maintain it open all compactions will stop. Is it 
> incorrect Hive behavior, or Storm should close transaction?
> 
> 
> 
> 
> On Wed, Jul 27, 2016 at 8:46 PM, Igor Kuzmenko  wrote:
> Thanks for reply, Alan. My guess with Storm was wrong. Today I get same 
> behavior with running Storm topology. 
> Anyway, I'd like to know, how can I check that transaction batch was closed 
> correctly?
> 
> On Wed, Jul 27, 2016 at 8:09 PM, Alan Gates  wrote:
> I don’t know the details of how the storm application that streams into Hive 
> works, but this sounds like the transaction batches weren’t getting closed.  
> Compaction can’t happen until those batches are closed.  Do you know how you 
> had storm configured?  Also, you might ask separately on the storm list to 
> see if people have seen this issue before.
> 
> Alan.
> 
> > On Jul 27, 2016, at 03:31, Igor Kuzmenko  wrote:
> >
> > One more thing. I'm using Apache Storm to stream data in Hive. And when I 
> > turned off Storm topology compactions started to work properly.
> >
> > On Tue, Jul 26, 2016 at 6:28 PM, Igor Kuzmenko  wrote:
> > I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive 
> > Streaming API. After some time i expect compaction to start but it didn't 
> > happen:
> >
> > Here's part of log, which shows that compactor initiator thread doesn't see 
> > any delta files:
> > 2016-07-26 18:06:52,459 INFO  [Thread-8]: compactor.Initiator 
> > (Initiator.java:run(89)) - Checking to see if we should compact 
> > default.data_aaa.dt=20160726
> > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: io.AcidUtils 
> > (AcidUtils.java:getAcidState(432)) - in directory 
> > hdfs://sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/data_aaa/dt=20160726
> >  base = null deltas = 0
> > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: compactor.Initiator 
> > (Initiator.java:determineCompactionType(271)) - delta size: 0 base size: 0 
> > threshold: 0.1 will major compact: false
> >
> > But in that directory there's actually 23 files:
> >
> > hadoop fs -ls /apps/hive/warehouse/data_aaa/dt=20160726
> > Found 23 items
> > -rw-r--r--   3 storm hdfs  4 2016-07-26 17:20 
> > /apps/hive/warehouse/data_aaa/dt=20160726/_orc_acid_version
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:22 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71741256_71741355
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:23 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71762456_71762555
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:25 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71787756_71787855
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:26 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71795756_71795855
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:27 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71804656_71804755
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:29 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71828856_71828955
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:30 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71846656_71846755
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:32 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71850756_71850855
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:33 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71867356_71867455
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:34 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71891556_71891655
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:36 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71904856_71904955
> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:37 
> > /apps/hive/warehouse/data_aaa/dt=20160726/delta_71907256_71907355
> > drwxrwxrw

Fwd: Building Spark 2 from source that does not include the Hive jars

2016-07-28 Thread Mich Talebzadeh
Anyone in Hive forum knows about this?

Thanks

This has worked before including 1.6.1 etc

Build Spark without Hive jars. The idea being to use Spark as Hive
execution engine.

There is some notes on Hive on Spark: Getting Started


The usual process is to do

dev/make-distribution.sh --name "hadoop2-without-hive" --tgz
"-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"

However, now I am getting this warning
[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 10:08 min (Wall Clock)
[INFO] Finished at: 2016-07-27T15:07:11+01:00
[INFO] Final Memory: 98M/1909M
[INFO]

+ rm -rf /data6/hduser/spark-2.0.0/dist
+ mkdir -p /data6/hduser/spark-2.0.0/dist/jars
+ echo 'Spark [WARNING] The requested profile "parquet-provided" could not
be activated because it does not exist. built for Hadoop [WARNING] The
requested profile "parquet-provided" could not be activated because it does
not exist.'
+ echo 'Build flags: -Pyarn,hadoop-provided,hadoop-2.6,parquet-provided'


And this is the only tgz file I see

./spark-[WARNING] The requested profile "parquet-provided" could not be
activated because it does not exist.-bin-hadoop2-without-hive.tgz

Any clues what is happening and the correct way of creating the build:

My interest is to extract the jar file similar to below from the build

 spark-assembly-1.3.1-hadoop2.4.0.jar

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: Hive compaction didn't launch

2016-07-28 Thread Igor Kuzmenko
I made some research on that issue.
The problem is in ValidCompactorTxnList::isTxnRangeValid

method.

Here's code:

@Override
public RangeResponse isTxnRangeValid(long minTxnId, long maxTxnId) {
  if (highWatermark < minTxnId) {
return RangeResponse.NONE;
  } else if (minOpenTxn < 0) {
return highWatermark >= maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
  } else {
return minOpenTxn > maxTxnId ? RangeResponse.ALL : RangeResponse.NONE;
  }
}


In my case this method returned RangeResponce.NONE for most of delta files.
With this value delta file doesn't include in compaction.

Last 'else' bock compare minOpenTxn to maxTxnId and if maxTxnId bigger
return *RangeResponce.NONE, *thats a problem for me, because of using Storm
Hive Bolt. Hive Bolt gets transaction and maintain it open with heartbeat
until there's data to commit.

So if i get transaction and maintain it open all compactions will stop. Is
it incorrect Hive behavior, or Storm should close transaction?




On Wed, Jul 27, 2016 at 8:46 PM, Igor Kuzmenko  wrote:

> Thanks for reply, Alan. My guess with Storm was wrong. Today I get same
> behavior with running Storm topology.
> Anyway, I'd like to know, how can I check that transaction batch was
> closed correctly?
>
> On Wed, Jul 27, 2016 at 8:09 PM, Alan Gates  wrote:
>
>> I don’t know the details of how the storm application that streams into
>> Hive works, but this sounds like the transaction batches weren’t getting
>> closed.  Compaction can’t happen until those batches are closed.  Do you
>> know how you had storm configured?  Also, you might ask separately on the
>> storm list to see if people have seen this issue before.
>>
>> Alan.
>>
>> > On Jul 27, 2016, at 03:31, Igor Kuzmenko  wrote:
>> >
>> > One more thing. I'm using Apache Storm to stream data in Hive. And when
>> I turned off Storm topology compactions started to work properly.
>> >
>> > On Tue, Jul 26, 2016 at 6:28 PM, Igor Kuzmenko 
>> wrote:
>> > I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive
>> Streaming API. After some time i expect compaction to start but it didn't
>> happen:
>> >
>> > Here's part of log, which shows that compactor initiator thread doesn't
>> see any delta files:
>> > 2016-07-26 18:06:52,459 INFO  [Thread-8]: compactor.Initiator
>> (Initiator.java:run(89)) - Checking to see if we should compact
>> default.data_aaa.dt=20160726
>> > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: io.AcidUtils
>> (AcidUtils.java:getAcidState(432)) - in directory hdfs://
>> sorm-master01.msk.mts.ru:8020/apps/hive/warehouse/data_aaa/dt=20160726
>> base = null deltas = 0
>> > 2016-07-26 18:06:52,496 DEBUG [Thread-8]: compactor.Initiator
>> (Initiator.java:determineCompactionType(271)) - delta size: 0 base size: 0
>> threshold: 0.1 will major compact: false
>> >
>> > But in that directory there's actually 23 files:
>> >
>> > hadoop fs -ls /apps/hive/warehouse/data_aaa/dt=20160726
>> > Found 23 items
>> > -rw-r--r--   3 storm hdfs  4 2016-07-26 17:20
>> /apps/hive/warehouse/data_aaa/dt=20160726/_orc_acid_version
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:22
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71741256_71741355
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:23
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71762456_71762555
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:25
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71787756_71787855
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:26
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71795756_71795855
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:27
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71804656_71804755
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:29
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71828856_71828955
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:30
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71846656_71846755
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:32
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71850756_71850855
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:33
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71867356_71867455
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:34
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71891556_71891655
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:36
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71904856_71904955
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:37
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71907256_71907355
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:39
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71918756_71918855
>> > drwxrwxrwx   - storm hdfs  0 2016-07-26 17:40
>> /apps/hive/warehouse/data_aaa/dt=20160726/delta_71947556_719

Re: Hive on spark

2016-07-28 Thread Mudit Kumar
Thanks Guys for the help!

Thanks,
Mudit

From:  Mich Talebzadeh 
Reply-To:  
Date:  Thursday, July 28, 2016 at 9:43 AM
To:  user 
Subject:  Re: Hive on spark

Hi,

I made a presentation in London on 20th July on this subject:. In that I 
explained how to make Spark work as an execution engine for Hive.

Query Engines for Hive, MR, Spark, Tez and LLAP – Considerations! 

See if I can send the presentation 

Cheers


Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

On 28 July 2016 at 04:24, Mudit Kumar  wrote:
Yes Mich,exactly.

Thanks,
Mudit

From:  Mich Talebzadeh 
Reply-To:  
Date:  Thursday, July 28, 2016 at 1:08 AM
To:  user 
Subject:  Re: Hive on spark

You mean you want to run Hive using Spark as the execution engine which uses 
Yarn by default?


Something like below

hive> select max(id) from oraclehadoop.dummy_parquet;
Starting Spark Job = 8218859d-1d7c-419c-adc7-4de175c3ca6d
Query Hive on Spark job[1] stages:
2
3
Status: Running (Hive on Spark job[1])
Job Progress Format
CurrentTime StageId_StageAttemptId: 
SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount 
[StageCost]
2016-07-27 20:38:17,269 Stage-2_0: 0(+8)/24 Stage-3_0: 0/1
2016-07-27 20:38:20,298 Stage-2_0: 8(+4)/24 Stage-3_0: 0/1
2016-07-27 20:38:22,309 Stage-2_0: 11(+1)/24Stage-3_0: 0/1
2016-07-27 20:38:23,330 Stage-2_0: 12(+8)/24Stage-3_0: 0/1
2016-07-27 20:38:26,360 Stage-2_0: 17(+7)/24Stage-3_0: 0/1
2016-07-27 20:38:27,386 Stage-2_0: 20(+4)/24Stage-3_0: 0/1
2016-07-27 20:38:28,391 Stage-2_0: 21(+3)/24Stage-3_0: 0/1
2016-07-27 20:38:29,395 Stage-2_0: 24/24 Finished   Stage-3_0: 1/1 Finished
Status: Finished successfully in 13.14 seconds
OK
1
Time taken: 13.426 seconds, Fetched: 1 row(s)


HTH

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

On 27 July 2016 at 20:31, Mudit Kumar  wrote:
Hi All,

I need to configure hive cluster based on spark engine (yarn).
I already have a running hadoop cluster.

Can someone point me to relevant documentation?

TIA.

Thanks,
Mudit