Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

2016-05-18 Thread Abhi Basu
I have tried with batch_size=500 and still get same error. For your
reference are attached info that may help diagnose.

Error: Error while applying Kudu session.: Incomplete: not enough space
remaining in buffer for op (required 46.7K, 7.00M already used


Config settings:

Kudu Tablet Server Block Cache Capacity   1 GB
Kudu Tablet Server Hard Memory Limit  16 GB


On Wed, May 18, 2016 at 8:26 AM, William Berkeley 
wrote:

> Both options are more or less the same idea- the point is you need less
> rows going in per batch so you don't go over the batch size limit. Follow
> what Todd said as he explained it more clearly and suggested a better way.
>
> -Will
>
> On Wed, May 18, 2016 at 10:45 AM, Abhi Basu <9000r...@gmail.com> wrote:
>
>> Thanks for the updates. I will give both options a try and report back.
>>
>> If you are interested in testing with such datasets, I can help.
>>
>> Thanks,
>>
>> Abhi
>>
>> On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon  wrote:
>>
>>> Hi Abhi,
>>>
>>> Will is right that the error is client-side, and probably happening
>>> because your rows are so wide.Impala typically will batch 1000 rows at a
>>> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
>>> overflow the max buffer size that Will mentioned. This seems quite probable
>>> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>>>
>>> I don't think his suggested workaround will help, but you can try
>>> running 'set batch_size=500' before running the create table or insert
>>> query.
>>>
>>> In terms of max supported columns, most of the workloads we are focusing
>>> on are more like typical data-warehouse tables, on the order of a couple
>>> hundred columns. Crossing into the 1000+ range enters "uncharted territory"
>>> where it's much more likely you'll hit problems like this and quite
>>> possibly others as well. Will be interested to hear your experiences,
>>> though you should probably be prepared for some rough edges.
>>>
>>> -Todd
>>>
>>> On Tue, May 17, 2016 at 8:32 PM, William Berkeley <
>>> wdberke...@cloudera.com> wrote:
>>>
 Hi Abhi.

 I believe that error is actually coming from the client, not the
 server. See e,g,
 https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787
  (NB
 that link is to master branch not the exact release you are using).

 If you look around there, you'll see that the max is set by something
 called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
 bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).

 I think the simple workaround would be to do the CTAS as a CTAS +
 insert as select. Pick a condition that bipartitions the table, so you
 don't get errors trying to double insert rows.

 -Will

 On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <9000r...@gmail.com> wrote:

> What is the limit of columns in Kudu?
>
> I am using 1000 gen dataset, specifically the chr22 table which has
> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I 
> am
> trying to create a new Kudu table as select from that table. I get the
> following error:
>
> Error while applying Kudu session.: Incomplete: not enough space
> remaining in buffer for op (required 46.7K, 6.96M already used
>
> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see
> the following. What configuration needs to be tweaked?
>
>
> Memory usage by subsystem
> IdParentLimitCurrent ConsumptionPeak consumption
> root none 50.12G 4.97M 6.08M
> block_cache-sharded_lru_cache root none 937.9K 937.9K
> code_cache-sharded_lru_cache root none 1B 1B
> server root none 2.3K 201.4K
> tablet- server none 530B 200.1K
> MemRowSet-6 tablet- none 265B 265B
> txn_tracker tablet- 64.00M 0B 28.5K
> DeltaMemStores tablet- none 265B 87.8K
> log_block_manager server none 1.8K 2.7K
>
> Thanks,
> --
> Abhi Basu
>


>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> Abhi Basu
>>
>
>


-- 
Abhi Basu
{
  "details" : "Query (id=314c0c293bf25601:284c797fc6e9409d)\n  Summary\n
Session ID: 2d4c2924eae92415:db543427a839ba9d\nSession Type: BEESWAX\n
Start Time: 2016-05-18 10:36:15.681951000\nEnd Time: 2016-05-18 
10:36:30.06357\nQuery Type: DDL\nQuery State: EXCEPTION\nQuery 
Status: \nError while applying Kudu session.: Incomplete: not enough space 
remaining in buffer for op (required 46.7K, 7.00M already used\n\nImpala 
Version: impalad version 2.6.0-IMPALA_KUDU-cdh5 RELEASE (build 
82d950143cff09ee21a22c88d3f5f0d676f6bb83)\nUser: root\nConnected User: 
root\nDelegated User: \n   

Re: CDH 5.5 - Kudu error not enough space remaining in buffer for op

2016-05-18 Thread Abhi Basu
Thanks for the updates. I will give both options a try and report back.

If you are interested in testing with such datasets, I can help.

Thanks,

Abhi

On Wed, May 18, 2016 at 6:25 AM, Todd Lipcon  wrote:

> Hi Abhi,
>
> Will is right that the error is client-side, and probably happening
> because your rows are so wide.Impala typically will batch 1000 rows at a
> time when inserting into Kudu, so if each of your rows is 7-8KB, that will
> overflow the max buffer size that Will mentioned. This seems quite probable
> if your data is 1000 columns of doubles or int64s (which are 8 bytes each).
>
> I don't think his suggested workaround will help, but you can try running
> 'set batch_size=500' before running the create table or insert query.
>
> In terms of max supported columns, most of the workloads we are focusing
> on are more like typical data-warehouse tables, on the order of a couple
> hundred columns. Crossing into the 1000+ range enters "uncharted territory"
> where it's much more likely you'll hit problems like this and quite
> possibly others as well. Will be interested to hear your experiences,
> though you should probably be prepared for some rough edges.
>
> -Todd
>
> On Tue, May 17, 2016 at 8:32 PM, William Berkeley  > wrote:
>
>> Hi Abhi.
>>
>> I believe that error is actually coming from the client, not the server.
>> See e,g,
>> https://github.com/apache/incubator-kudu/blob/master/src/kudu/client/batcher.cc#L787
>>  (NB
>> that link is to master branch not the exact release you are using).
>>
>> If you look around there, you'll see that the max is set by something
>> called max_buffer_size_, which appears to be hardcoded to 7 * 1024 * 1024
>> bytes = 7MiB (and this is consistent with 6.96 + 0.0467 > 7).
>>
>> I think the simple workaround would be to do the CTAS as a CTAS + insert
>> as select. Pick a condition that bipartitions the table, so you don't get
>> errors trying to double insert rows.
>>
>> -Will
>>
>> On Tue, May 17, 2016 at 4:45 PM, Abhi Basu <9000r...@gmail.com> wrote:
>>
>>> What is the limit of columns in Kudu?
>>>
>>> I am using 1000 gen dataset, specifically the chr22 table which has
>>> 500,000 rows x 1101 columns. This table has been built In Impala/HDFS. I am
>>> trying to create a new Kudu table as select from that table. I get the
>>> following error:
>>>
>>> Error while applying Kudu session.: Incomplete: not enough space
>>> remaining in buffer for op (required 46.7K, 6.96M already used
>>>
>>> When looking at http://pcsd-cdh2.local.com:8051/mem-trackers, I see the
>>> following. What configuration needs to be tweaked?
>>>
>>>
>>> Memory usage by subsystem
>>> IdParentLimitCurrent ConsumptionPeak consumption
>>> root none 50.12G 4.97M 6.08M
>>> block_cache-sharded_lru_cache root none 937.9K 937.9K
>>> code_cache-sharded_lru_cache root none 1B 1B
>>> server root none 2.3K 201.4K
>>> tablet- server none 530B 200.1K
>>> MemRowSet-6 tablet- none 265B 265B
>>> txn_tracker tablet- 64.00M 0B 28.5K
>>> DeltaMemStores tablet- none 265B 87.8K
>>> log_block_manager server none 1.8K 2.7K
>>>
>>> Thanks,
>>> --
>>> Abhi Basu
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Abhi Basu