Any plans for "Aggregation Push down" or integrating Impala + Kudu more tightly?

2017-06-29 Thread Jason Heo
Hi,

Q1.

After reading Druid vs Kudu
, I wondered
Druid has aggregation push down.

*Druid includes its own query layer that allows it to push down
> aggregations and computations directly to data nodes for faster query
> processing. *


If I understand "Aggregation Push down" correctly, it seems that partial
aggregation is done by data node side, so that only small amount of result
set can be transferred to a client which could lead to great performance
gain. (Am I right?)

So I wanted to know if Apache Kudu has a plan for Aggregation push down
scan feature (Or already has it)

Q2.

One thing that I concern when using Impala+Kudu is that all matching rows
should transferred to impala process from kudu tserver. Usually Impala and
Kudu tserver run on same node. So It would be happy If Impala can read Kudu
Tablet directly. Any plan for this kind of features?

How-to: Use Impala and Kudu Together for Analytic Workloads

says that:

*we intend to implement the Apache Arrow in-memory data format and to share
> memory between Kudu and Impala, which we expect will help with performance
> and resource usage.*
>

What does "share memory between Kudu and Impala"? Does this already
implemented?

Thanks

Regards,

Jason


Re: What does "Failed RPC negotiation" in kudu-tserver.WARNING

2017-06-17 Thread Jason Heo
Hi Jean-Daniel, Todd, and Alexey

Thank your for the replies.

Recently, I've experienced many issues but successfully resolved them with
your helps. I really appreciate it.

Regards,

Jason


How to manage yearly range partition efficiently

2017-06-07 Thread Jason Heo
Hi.

This is a partition strategy of my table.

PARTITION BY HASH (...) PARTITIONS 40,
RANGE (ymd) (
PARTITION VALUES < "2015",
PARTITION "2015" <= VALUES < "2016",
PARTITION "2016" <= VALUES < "2017",
PARTITION "2017" <= VALUES
)

My concern is that how to manage RANGE(ymd) partitions for greather than
2017.

plan 1) using a cron job, add 2018 partition at the end of 2017, add 2019
partition at the end of 2018, ...
- pros: no unused partitions
- cons: problems arise if next year's partition is not created by
mistake
plan 2) add all upcoming 10 years' partitions
- pros: can reduce risks
- cons: 400 partitions (40*10 years) are created but they has no data

I prefer to plan 2) but I'm wondering what many unnecessarily partitions
lead to problems.

Any suggestion?

Regards,

Jason


Re: kudu-tserver died suddenly

2017-06-06 Thread Jason Heo
How can I avoid this known bug?

a. downgrade to Kudu 1.2 and upgrade after fixed
b. decrease mm num threads (I also set to 8 currently)

I have Data which is loaded by Kudu 1.4, and I'm using CDH 5.11.0. I'm
wondering it is safe to downgrade to Kudu 1.2 without reinstalling or
dropping all Data.

Thanks.

2017-06-06 15:13 GMT+09:00 Jason Heo <jason.heo@gmail.com>:

> Hi Todd,
>
> Thank you for your reply.
>
> Ok, I got it. I should have googled it before mailing ;)
>
> Regards,
>
> Jason
>
> 2017-06-06 15:03 GMT+09:00 Todd Lipcon <t...@cloudera.com>:
>
>> Hi Jason,
>>
>> It sounds like you hit https://issues.apache.org/jira/browse/KUDU-1956
>> -- it's a known bug that we haven't gotten around to fixing yet. I hadn't
>> seen it "in the wild" before, but I'll add a note to the JIRA that you hit
>> it, and try to prioritize a fix soon (eg for 1.4.1)
>>
>> -Todd
>>
>> On Mon, Jun 5, 2017 at 6:38 PM, Jason Heo <jason.heo@gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>> I'm using this patch https://gerrit.cloudera.org/#/c/6925/
>>>
>>> One of tservers died suddenly. Here is ERROR and FATAL log.
>>>
>>> E0605 15:04:33.376554 138642 tablet.cc:1219] T
>>> 3cca831acf744e1daee72582b8e16dc4 P 125dbd2ffb8a401bb7e4fd982995ccf8:
>>> Rowset selected for compaction but not available anymore: RowSet(150)
>>>
>>> E0605 15:04:33.376605 138642 tablet.cc:1219] T
>>> 3cca831acf744e1daee72582b8e16dc4 P 125dbd2ffb8a401bb7e4fd982995ccf8:
>>> Rowset selected for compaction but not available anymore: RowSet(59)
>>>
>>> E0605 15:04:33.376615 138642 tablet.cc:1219] T
>>> 3cca831acf744e1daee72582b8e16dc4 P 125dbd2ffb8a401bb7e4fd982995ccf8:
>>> Rowset selected for compaction but not available anymore: RowSet(60)
>>>
>>> F0605 15:04:33.377100 138642 tablet.cc:1222] T
>>> 3cca831acf744e1daee72582b8e16dc4 P 125dbd2ffb8a401bb7e4fd982995ccf8:
>>> Was unable to find all rowsets selected for compaction
>>>
>>>
>>> 
>>>
>>>
>>> Could I know what's the problem? Feel free to ask any information to
>>> resolve it.
>>>
>>>
>>> Thank,
>>>
>>>
>>> Jason
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>


Re: Question about redistributing tablets on failure of a tserver.

2017-05-19 Thread Jason Heo
Thanks, @dan @Todd

This issue has been resolved via https://gerrit.cloudera.org/#/c/6925/

Regards,

Jason

2017-05-09 4:55 GMT+09:00 Todd Lipcon <t...@cloudera.com>:

> Hey Jason
>
> Sorry for the delayed response here. It looks from your ksck like copying
> is ongoing but hasn't yet finished.
>
> FWIW Will B is working on adding more informative output to ksck to help
> diagnose cases like this:
> https://gerrit.cloudera.org/#/c/6772/
>
> -Todd
>
> On Thu, Apr 13, 2017 at 11:35 PM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> @Dan
>>
>> I monitored with `kudu ksck` while re-replication is occurring, but I'm
>> not sure if this output means my cluster has a problem. (It seems just
>> indicating one tserver stopped)
>>
>> Would you please check it?
>>
>> Thank,
>>
>> Jason
>>
>> ```
>> ...
>> ...
>> Tablet 0e29XXX1e1e3168a4d81 of table 'impala::tbl1' is
>> under-replicated: 1 replica(s) not RUNNING
>>   a7ca07f9bXXXbbb21cfb (hostname.com:7050): RUNNING
>>   a97644XXXdb074d4380f (hostname.com:7050): RUNNING [LEADER]
>>   401b6XXX5feda1de212b (hostname.com:7050): missing
>>
>> Tablet 550XXX08f5fc94126927 of table 'impala::tbl1' is
>> under-replicated: 1 replica(s) not RUNNING
>>   aec55b4XXXdb469427cf (hostname.com:7050): RUNNING [LEADER]
>>   a7ca07f9b3d94XXX1cfb (hostname.com:7050): RUNNING
>>   31461XXX3dbe060807a6 (hostname.com:7050): bad state
>> State:   NOT_STARTED
>> Data state:  TABLET_DATA_READY
>> Last status: Tablet initializing...
>>
>> Tablet 4a1490fcXXX7a2c637e3 of table 'impala::tbl1' is
>> under-replicated: 1 replica(s) not RUNNING
>>   a7ca07f9b3d94414XXXb (hostname.com:7050): RUNNING
>>   40XXXd5b5feda1de212b (hostname.com:7050): RUNNING [LEADER]
>>   aec55b4e2acXXX9427cf (hostname.com:7050): bad state
>> State:   NOT_STARTED
>> Data state:  TABLET_DATA_COPYING
>> Last status: TabletCopy: Downloading block 05162382 (277/581)
>> ...
>> ...
>> ==
>> Errors:
>> ==
>> table consistency check error: Corruption: 52 table(s) are bad
>>
>> FAILED
>> Runtime error: ksck discovered errors
>> ```
>>
>>
>>
>> 2017-04-13 3:47 GMT+09:00 Dan Burkert <danburk...@apache.org>:
>>
>>> Hi Jason, answers inline:
>>>
>>> On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo <jason.heo@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Q1. Can I disable redistributing tablets on failure of a tserver? The
>>>> reason why I need this is described in Background.
>>>>
>>>
>>> We don't have any kind of built-in maintenance mode that would prevent
>>> this, but it can be achieved by setting a flag on each of the tablet
>>> servers.  The goal is not to disable re-replicating tablets, but instead to
>>> avoid kicking the failed replica out of the tablet groups to begin with.
>>> There is a config flag to control exactly that: 'evict_failed_followers'.
>>> This isn't considered a stable or supported flag, but it should have the
>>> effect you are looking for, if you set it to false on each of the tablet
>>> servers, by running:
>>>
>>> kudu tserver set-flag  evict_failed_followers false
>>> --force
>>>
>>> for each tablet server.  When you are done, set it back to the default
>>> 'true' value.  This isn't something we routinely test (especially setting
>>> it without restarting the server), so please test before trying this on a
>>> production cluster.
>>>
>>> Q2. redistribution goes on even if the failed tserver reconnected to
>>>> cluster. In my test cluster, it took 2 hours to distribute when a tserver
>>>> which has 3TB data was killed.
>>>>
>>>
>>> This seems slow.  What's the speed of your network?  How many nodes?
>>> How many tablet replicas were on the failed tserver, and were the replica
>>> sizes evenly balanced?  Next time this happens, you might try monitoring
>>> with 'kudu ksck' to ensure there aren't additional problems in the cluster 
>>> (admin guide
>>> on the ksck tool
>>> <https://github.com/apache/kudu/blob/master/docs/administration.adoc#ksck>
>>> ).
>>>
>>>
>>>> Q3. `--follower_unavailable_considered_failed_sec` can be changed
>>>> without restarting cluster?
>>>>
>>>
>>> The flag can be changed, but it comes with the same caveats as above:
>>>
>>> 'kudu tserver set-flag  
>>> follower_unavailable_considered_failed_sec
>>> 900 --force'
>>>
>>>
>>> - Dan
>>>
>>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: Kudu Table Design Question

2017-04-27 Thread Jason Heo
Hi David.

Thank you so much!

Regards,

Jason

2017-04-28 2:50 GMT+09:00 David Alves <davidral...@gmail.com>:

> The suggestion of 20-30 per tablet is more about the number of available
> cores than the size of the data.
> Tools like impala derive parallelism from the number of tablets thus
> having that count adjusted (but not necessarily equal to) to the core count
> gives you a good performance tradeoff.
> Of course this is not a hard limit, the tablet server should be able
> anything up to 100 reasonably well depending on your hardware.
>
> HTH
> -david
>
>
> On Tue, Apr 25, 2017 at 9:19 PM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> Hi.
>>
>> This email
>> <http://mail-archives.apache.org/mod_mbox/kudu-user/201702.mbox/%3CCALo2W-WkicyuSiErfn2bNPyDAVd%3DjxA0hLij%2BA_tdtVBTYu-wQ%40mail.gmail.com%3E>
>>  (and
>> many other resources) suggests that tserver should have small number of
>> tablets.
>>
>> In the above mail, Dan says that:
>>
>> >> something more like 20 or 30 would be ideal depending on hardware..
>> >> ...
>> >> I would aim for tablet size on the order of 50GiB,
>>
>>
>> I'm curious why 20~30 per tserver would be ideal. Does this mean storing
>> 1TB~1.5TB per tserver is ideal? Could someone please explain this?
>>
>> The reason I ask is that I'm currently doing capacity planning.
>>
>> Regards,
>>
>> Jason
>>
>
>


Kudu Table Design Question

2017-04-25 Thread Jason Heo
Hi.

This email

(and
many other resources) suggests that tserver should have small number of
tablets.

In the above mail, Dan says that:

>> something more like 20 or 30 would be ideal depending on hardware..
>> ...
>> I would aim for tablet size on the order of 50GiB,


I'm curious why 20~30 per tserver would be ideal. Does this mean storing
1TB~1.5TB per tserver is ideal? Could someone please explain this?

The reason I ask is that I'm currently doing capacity planning.

Regards,

Jason


Re: Some bulk requests are missing when a tserver stopped

2017-04-25 Thread Jason Heo
Hi David.

>> *Were there errors, like timeouts, when you were writing the rows?*

Yes, there were many timeout errors. (Errors and stack trace included in my
first email, anyway I copy it below)

Note that I have two test scenarios during bulk loading.

- test 1: restart a tserver
- test 2: stop a tserver (and do not start again)

both test 1 and 2 produced below errors, but only test 2 has incomplete
rows (2% loss)

I use Spark 1.6 and package org.apache.kudu:kudu-spark_2.10:1.1.0 for bulk
loading.

java.lang.RuntimeException: failed to write 2 rows from DataFrame to Kudu;
sample errors: Timed out: RPC can not complete before timeout:
Batch{operations=2, tablet='1e83668a9fa44883897474eaa20a7cad'
[0x0001323031362D3036, 0x0001323031362D3037),
ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write,
tablet=1e83668a9fa44883897474eaa20a7cad, attempt=25,
DeadlineTracker(timeout=3, elapsed=29298), Traces: [0ms] sending RPC to
server 01d513bc5c1847c29dd89c3d21a1eb64, [589ms] received from server
01d513bc5c1847c29dd89c3d21a1eb64 response Network error: [Peer
01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [589ms] delaying RPC
due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection
reset, [597ms] querying master, [597ms] Sub rpc: GetTableLocations sending
RPC to server 50cb634c24ef426c9147cc4b7181ca11, [599ms] Sub rpc:
GetTableLocations sending RPC to server 50cb634c24ef426c9147cc4b7181ca11,
[643ms
...
...
received from server 01d513bc5c1847c29dd89c3d21a1eb64 response Network
error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [29357ms]
delaying RPC due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64]
Connection reset)}
at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
apply(KuduContext.scala:184)
at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.
apply(KuduContext.scala:179)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
n$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfu
n$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
ontext.scala:1869)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkC
ontext.scala:1869)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

2017-04-26 0:28 GMT+09:00 David Alves <davidral...@gmail.com>:

> Hi Jason
>
>   Were there errors, like timeouts, when you were writing the rows?
>
> -david
>
> On Mon, Apr 24, 2017 at 7:38 PM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> Hi David, Todd.
>>
>> @David
>>
>> >> *What do you mean that 2% are missing?*
>> For example, after inserting 200M rows table has only 196M rows.
>>
>> >> *How are you checking that all the rows are there?*
>> `SELECT COUNT(*) FROM tab` was executed via impala-shell.
>>
>> >> *do you still observe that rows are missing if you run the scans
>> again?*
>>
>> Yes. 2 days past, but it has smaller rows.
>>
>> @Todd
>>
>> Here is the out of `kudu cluster ksck -checksum_scan -tables table_name`.
>> Does it seem like having problems?
>>
>> ```
>> Table table_name is HEALTHY (440 tablet(s) checked)
>>
>> The metadata for 1 table(s) is HEALTHY
>> Using snapshot timestamp: 6115681861904150528
>> Checksum running for 5s: 849/1320 replicas remaining (3.06G from disk,
>> 19.89M rows summed)
>> Checksum running for 10s: 849/1320 replicas remaining (6.76G from disk,
>> 44.01M rows summed)
>> ...
>> ...
>> Checksum finished in 352s: 0/1320 replicas remaining (100.37G from disk,
>> 636.48M rows summed)
>> ---
>> table_name
>> ---
>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P 380857bbccbd4bb3bddce021ffd1d89c
>> (hostname:7050): Checksum: 0
>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P 66b8e4e226844800aae13601458130b3
>> (hostname:7050): Checksum: 0
>> T 9ca22d9f67b6490986d3cd93ccfb3d58 P aec55b4e2ac140b6b57261db469427cf
>> (hostname:7050): Checksum: 0
>> T 7d0b4aa457954529bfcbe2b9842424ea P 380857bbccbd4bb3bddce021ffd1d89c
>> (hostname:7050): Checksum: 11616215297851188
>> T 7d0b4aa457954529bfcbe2b9842424ea P 66b8e4e226844800aae13601458130b3
>> (hostname:7050): Checksum: 11616215297851188
>> T 7d0b4aa457954529bfcbe2b9842424ea P a9764471770d43bdab279db074d4380f
>> (hostname:7050): Checksum: 11616215297851188
>> .

Re: Number of data files and opened file descriptors are not decreasing after DROP TABLE.

2017-04-24 Thread Jason Heo
Thanks David

Hi Mike. I'm using Kudu 1.3.0 bundled in "Cloudera Express 5.10.0 (#85
built by jenkins on 20170120-1037 git:
aa0b5cd5eceaefe2f971c13ab657020d96bb842a)"

My concern is that something does not free up cleanly and something wastes
of my resources. eg) I dropped a 30TB table, but in tablet_data, there are
still 3TB files. And the output of "lsof" shows that tserver opens 50M
files. So I emailed to know how to remove unnecessarily files.

It seems I can't use "kudu fs check" though.

$ kudu fs check
Invalid argument: unknown command 'check'
Usage:
/path/to/cloudera/parcels/KUDU-1.3.0-1.cdh5.11.0.p0.12/bin/../lib/kudu/bin/kudu
fs  []

 can be one of the following:
dump   Dump a Kudu filesystem
  format   Format a new Kudu filesystem

Then I'll try "kudu fs check" when it will be available in Cloudera Manager

Thanks

2017-04-25 3:54 GMT+09:00 Mike Percy <mpe...@apache.org>:

> HI Jason,
> I would strongly recommend upgrading to Kudu 1.3.1 as 1.3.0 has a serious
> data-loss bug related to re-replication. Please see
> https://kudu.apache.org/releases/1.3.1/docs/release_notes.html (if you
> are using the Cloudera version of 1.3.0, no need to worry because it
> includes the fix for that bug).
>
> In 1.3.0 and 1.3.1 you should be able to use the "kudu fs check" tool to
> see if you have orphaned blocks. If you do, you could use the --repair
> argument to that tool to repair it if you bring your tablet server offline.
>
> That said, Kudu uses hole punching to delete data and the same container
> files may remain open even after removing data. After dropping tables, you
> should see disk usage at the file system level drop.
>
> I'm not sure that I've answered all your questions. If you have specific
> concerns, please let us know what you are worried about.
>
> Mike
>
> On Sun, Apr 23, 2017 at 11:43 PM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> Hi.
>>
>> Before dropping, there were about 30 tables, 27,000 files in tablet_data
>>  directory.
>> I dropped most tables and there is ONLY one table which has 400 tablets
>> in my test Kudu cluster.
>> After dropping, there are still 27,000 files in tablet_data directory,
>> and output of /sbin/lsof is the same before dropping. (kudu tserver
>> opens almost 50M files)
>>
>> I'm curious that this can be resolved using "kudu fs check" which is
>> available at Kudu 1.4.
>>
>> I used Kudu 1.2 when executing `DROP TABLE` and currently using Kudu 1.3.0
>>
>> Regards,
>>
>> Jason
>>
>>
>


Re: Some bulk requests are missing when a tserver stopped

2017-04-24 Thread Jason Heo
Hi David, Todd.

@David

>> *What do you mean that 2% are missing?*
For example, after inserting 200M rows table has only 196M rows.

>> *How are you checking that all the rows are there?*
`SELECT COUNT(*) FROM tab` was executed via impala-shell.

>> *do you still observe that rows are missing if you run the scans again?*

Yes. 2 days past, but it has smaller rows.

@Todd

Here is the out of `kudu cluster ksck -checksum_scan -tables table_name`.
Does it seem like having problems?

```
Table table_name is HEALTHY (440 tablet(s) checked)

The metadata for 1 table(s) is HEALTHY
Using snapshot timestamp: 6115681861904150528
Checksum running for 5s: 849/1320 replicas remaining (3.06G from disk,
19.89M rows summed)
Checksum running for 10s: 849/1320 replicas remaining (6.76G from disk,
44.01M rows summed)
...
...
Checksum finished in 352s: 0/1320 replicas remaining (100.37G from disk,
636.48M rows summed)
---
table_name
---
T 9ca22d9f67b6490986d3cd93ccfb3d58 P 380857bbccbd4bb3bddce021ffd1d89c
(hostname:7050): Checksum: 0
T 9ca22d9f67b6490986d3cd93ccfb3d58 P 66b8e4e226844800aae13601458130b3
(hostname:7050): Checksum: 0
T 9ca22d9f67b6490986d3cd93ccfb3d58 P aec55b4e2ac140b6b57261db469427cf
(hostname:7050): Checksum: 0
T 7d0b4aa457954529bfcbe2b9842424ea P 380857bbccbd4bb3bddce021ffd1d89c
(hostname:7050): Checksum: 11616215297851188
T 7d0b4aa457954529bfcbe2b9842424ea P 66b8e4e226844800aae13601458130b3
(hostname:7050): Checksum: 11616215297851188
T 7d0b4aa457954529bfcbe2b9842424ea P a9764471770d43bdab279db074d4380f
(hostname:7050): Checksum: 11616215297851188
...
T 770e22b6c0284523a15e688a86ab4f68 P 0abff808ff3e4248a9ddd97b01910e6c
(hostname:7050): Checksum: 0
T 770e22b6c0284523a15e688a86ab4f68 P 0d1887e5d116477e82f655e3153afba4
(hostname:7050): Checksum: 0
T 770e22b6c0284523a15e688a86ab4f68 P 401b6963d32b42d79d5b5feda1de212b
(hostname:7050): Checksum: 0
T 982615fa3b45461084dd6d60c3af9d4b P 7092c47bd9db4195887a521f17855b23
(hostname:7050): Checksum: 11126898289076092
T 982615fa3b45461084dd6d60c3af9d4b P 380857bbccbd4bb3bddce021ffd1d89c
(hostname:7050): Checksum: 11126898289076092
T 982615fa3b45461084dd6d60c3af9d4b P a7ca07f9b3d944148b74c478bbb21cfb
(hostname:7050): Checksum: 11126898289076092
...

==
Errors:
==
error fetching info from tablet servers: Network error: Not all Tablet
Servers are reachable

FAILED
Runtime error: ksck discovered errors
```

(One of tservers is still down, but it has no tablet of the invalid state
table.)

Regards,

Jason


2017-04-25 6:45 GMT+09:00 Todd Lipcon <t...@cloudera.com>:

> I think it's also worth trying 'kudu cluster ksck -checksum_scan
> <master1,master2,master3>' to perform a consistency check. This will ensure
> that the available replicas have matching data (and uses the SNAPSHOT scan
> mode to avoid the inconsistency that David mentioned above).
>
> On Mon, Apr 24, 2017 at 2:38 PM, David Alves <davidral...@gmail.com>
> wrote:
>
>> Hi Jason
>>
>>   What do you mean that 2% are missing? Were you not able to insert them
>> (got a timeout) or where there no errors but you can't see the rows as the
>> result of a scan?
>>   How are you checking that all the rows are there? Through a regular
>> scan in spark? In particular the default ReadMode for scans makes no
>> guarantees about replica recency, so it might happen that when you kill a
>> tablet server, the other chosen replica is not up-to-date and returns less
>> rows. In this case it's not that the rows are missing just that the replica
>> that served the scan doesn't have them yet.
>>   These kinds of checks should likely be done with the READ_AT_SNAPSHOT
>> ReadMode but even if you can't change ReadModes, do you still observe that
>> rows are missing if you run the scans again?
>>   Currently some throttling might be required to make sure that the
>> clients don't overload the server with writes which causes writes to start
>> timing out. More efficient bulk loads is something we're working on right
>> now.
>>
>> Best
>> David
>>
>>
>> On Sat, Apr 22, 2017 at 6:48 AM, Jason Heo <jason.heo@gmail.com>
>> wrote:
>>
>>> Hi.
>>>
>>> I'm using Apache Kudu 1.2. I'm currently testing high availability of
>>> Kudu.
>>>
>>> During bulk loading, one tserver is stopped via CDH Manager
>>> intentionally and 2% of rows are missing.
>>>
>>> I use Spark 1.6 and package org.apache.kudu:kudu-spark_2.10:1.1.0 for
>>> bulk loading.
>>>
>>> I got a error several times during insertion. Although 2% is lost when
>>> tserver is stop and not started again, If I start it right after stopped,
&g

Re: tserver died during bulk indexing and dies again after restarting

2017-04-24 Thread Jason Heo
Hi David.

Thank you for your kind reply.

I understood but I'm afraid I can't provide my WAL because it has sensitive
data, even via your private email.

Regards,

Jason

2017-04-24 15:12 GMT+09:00 David Alves <davidral...@gmail.com>:

> Hi Jason
>
>   I meant the last wal segment for the 30aaccdf7c8c496a8ad73255856a1724
> tablet on the dead server (if you don't have sensitive data in there)
>   Not sure whether you specified the flag: "--fs_wal_dir". If so it should
> be in there, if not the wals are in the same dir as the value set for
> "--fs_data_dirs".
>   A wal file has a name like: "wal-1"
>
> Best
> David
>
>
> On Sat, Apr 22, 2017 at 7:46 PM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> Hi David.
>>
>> Sorry for the insufficient information.
>>
>> There are 14 nodes in my test kudu cluster. Only one tserver has been
>> dead. It has only above two logs.
>>
>> Other 13 nodes has "Error trying to read ahead of the log while
>> preparing peer request: Incomplete: Op with" error 7~10 times.
>>
>> >> *Would it be possible to also get the WAL with the corrupted entry?*
>>
>> Would you please explain how to get it in more detail?
>>
>> I tried what I did again and again to reproduce same error, but it didn't
>> happen again.
>>
>> Please feel free to ask me for anything what you need to resolve.
>>
>> Regards,
>>
>> Jason
>>
>> 2017-04-23 1:56 GMT+09:00 <davidral...@gmail.com>:
>>
>>> Hi Jason
>>>
>>>   Anything else of interest in those logs? Can you share them (with
>>> just me, if you prefer)? Would it be possible to also get the WAL with
>>> the corrupted entry?
>>>   Did this happen on a single server?
>>>
>>> Best
>>> David
>>>
>>
>>
>


Re: tserver died during bulk indexing and dies again after restarting

2017-04-22 Thread Jason Heo
Hi David.

Sorry for the insufficient information.

There are 14 nodes in my test kudu cluster. Only one tserver has been dead.
It has only above two logs.

Other 13 nodes has "Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with" error 7~10 times.

>> *Would it be possible to also get the WAL with the corrupted entry?*

Would you please explain how to get it in more detail?

I tried what I did again and again to reproduce same error, but it didn't
happen again.

Please feel free to ask me for anything what you need to resolve.

Regards,

Jason

2017-04-23 1:56 GMT+09:00 :

> Hi Jason
>
>   Anything else of interest in those logs? Can you share them (with
> just me, if you prefer)? Would it be possible to also get the WAL with
> the corrupted entry?
>   Did this happen on a single server?
>
> Best
> David
>


Some bulk requests are missing when a tserver stopped

2017-04-22 Thread Jason Heo
Hi.

I'm using Apache Kudu 1.2. I'm currently testing high availability of Kudu.

During bulk loading, one tserver is stopped via CDH Manager intentionally
and 2% of rows are missing.

I use Spark 1.6 and package org.apache.kudu:kudu-spark_2.10:1.1.0 for bulk
loading.

I got a error several times during insertion. Although 2% is lost when
tserver is stop and not started again, If I start it right after stopped,
there was no loss even though I got same error messages.


I watched Comcast's recent presentation at Strata Hadoop, They said that


Spark is recommended for large inserts to ensure handling failures
>
>
I'm curious Comcast has no issues with tserver failures and how can I
prevent rows from being lost.

--

Below is an spark error message. ("01db64" is the killed one.)


java.lang.RuntimeException: failed to write 2 rows from DataFrame to Kudu;
sample errors: Timed out: RPC can not complete before timeout:
Batch{operations=2, tablet='1e83668a9fa44883897474eaa20a7cad'
[0x0001323031362D3036, 0x0001323031362D3037),
ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, tablet=
1e83668a9fa44883897474eaa20a7cad, attempt=25,
DeadlineTracker(timeout=3, elapsed=29298), Traces: [0ms] sending RPC to
server 01d513bc5c1847c29dd89c3d21a1eb64, [589ms] received from server
01d513bc5c1847c29dd89c3d21a1eb64 response Network error: [Peer
01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [589ms] delaying RPC
due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection
reset, [597ms] querying master, [597ms] Sub rpc: GetTableLocations sending
RPC to server 50cb634c24ef426c9147cc4b7181ca11, [599ms] Sub rpc:
GetTableLocations sending RPC to server 50cb634c24ef426c9147cc4b7181ca11,
[643ms
...
...
received from server 01d513bc5c1847c29dd89c3d21a1eb64 response Network
error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [29357ms]
delaying RPC due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64]
Connection reset)}
at org.apache.kudu.spark.kudu.KuduContext$$anonfun$
writeRows$1.apply(KuduContext.scala:184)
at org.apache.kudu.spark.kudu.KuduContext$$anonfun$
writeRows$1.apply(KuduContext.scala:179)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$
anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$
anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
SparkContext.scala:1869)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
SparkContext.scala:1869)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
--


tserver died during bulk indexing and dies again after restarting

2017-04-22 Thread Jason Heo
Hi.

I'm using Apache Kudu 1.2.

One of tservers has been dead during bulk indexing. Here is the log of the
dead tserver.

Can I know that's the problem and how to start tserver?

==
$ tail -f kudu-tserver.WARNING
Log file created at: 2017/04/22 14:44:59
Running on machine: hostname
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
E0422 14:44:59.401752 124050 consensus_queue.cc:415] T
7db58ed040b24faf9dac3cd729e98d74 P 58fce39a905b4971b1951929ba30fe30
[LEADER]: Error trying to read ahead of the log while preparing peer
request: Incomplete: Op with index 16588 is ahead of the local log (next
sequential op: 16588). Destination peer: Peer:
0abff808ff3e4248a9ddd97b01910e6c, Is new: false, Last received: 1.16588,
Next index: 16589, Last known committed idx: 16586, Last exchange result:
SUCCESS, Needs tablet copy: false
F0422 14:44:59.624615 125459 transaction_driver.cc:358] T
30aaccdf7c8c496a8ad73255856a1724 P 58fce39a905b4971b1951929ba30fe30 S RD-NP
Ts 6114672227972882432: Cannot cancel transactions that have already
replicated: Invalid argument: Cannot decode client schema: Duplicate column
name: id transaction:RD-NP WriteTransaction [type=REPLICA,
start_time=2017-04-22 14:44:59, state=WriteTransactionState 0xd46f49c0
[op_id=(term: 1 index: 16758), ts=6114672227972882432, rows=[]]]
=

If I start tserver, it dies again.


tail -f kudu-tserver.INFO
Log file created at: 2017/04/22 18:06:19
Running on machine: hostname
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
F0422 18:06:19.268718  7058 transaction_driver.cc:358] T
30aaccdf7c8c496a8ad73255856a1724 P 58fce39a905b4971b1951929ba30fe30 S R-NP
Ts 6114672227972882432: Cannot cancel transactions that have already
replicated: Invalid argument: Cannot decode client schema: Duplicate column
name: id transaction:R-NP WriteTransaction [type=REPLICA,
start_time=2017-04-22 18:06:19, state=WriteTransactionState 0x57049c0
[op_id=(term: 1 index: 16758), ts=6114672227972882432, rows=[]]]
===


Thanks,

Jason


Table size is not decreasing after large amount of rows deleted.

2017-04-21 Thread Jason Heo
Hello. I'm using Apache Kudu 1.2.

I've deleted 30% of rows from a 20TB table. What I expected is that the
size of table was decreased by almost 30%.

But unfortunately, its size was increased by 4.4%. (I noticed that there
was no Major Delta compaction during deletion or after deletion)

Is there a way to reduce table size in this situation?

Thanks.


Re: Building from Source fails on my CentOS 7.2

2017-04-17 Thread Jason Heo
Hi Todd.

Good point!

It turned out that the customized krb5 library was pre-installed by my
infrastructure team. and it is a old version.

$ ldd /usr/lib64/libkrb5.so | grep k5cry
libk5crypto.so.3 => /usr/path/to/lib/libk5crypto.so.3 (0x7f4f17b23000)

Thanks,

Jason.

2017-04-18 4:00 GMT+09:00 Todd Lipcon <t...@cloudera.com>:

> Hi Jason,
>
> This is interesting. It seems like for some reason your libkrb5.so isn't
> properly linekd against libkrb5support.so. On a fresh CentOS 7.3 system I
> just booted, after installing krb5-devel packages, I see the symbols
> defined in the expected libraries:
>
> [root@todd-el7 ~]# ldd /usr/lib64/libkrb5.so | grep k5cry
> libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x7f8c68d34000)
> [root@todd-el7 ~]# objdump -T /lib64/libk5crypto.so.3 | grep
> enctype_to_name
> 00019cd0 gDF .text  012b  k5crypto_3_MIT
> krb5_enctype_to_name
>
> Do you have the MIT krb5 dev libraries installed, or is it possible you
> have heimdal or some other krb5 implementation?
>
> -Todd
>
> On Thu, Apr 13, 2017 at 10:00 PM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> Hello.
>>
>> I'm using CentOS 7.2
>>
>> To build from Source Code, I followed the manual
>> <https://kudu.apache.org/docs/installation.html#build_from_source> (except
>> for Re Hat Developer Toolset because I use CentOS 7.2)
>>
>> Though I failed to compile :(
>>
>> ```
>> ...
>> [ 31%] Building CXX object src/kudu/master/CMakeFiles/mas
>> ter.dir/sys_catalog.cc.o
>> [ 31%] Building CXX object src/kudu/master/CMakeFiles/mas
>> ter.dir/ts_descriptor.cc.o
>> [ 31%] Building CXX object src/kudu/master/CMakeFiles/mas
>> ter.dir/ts_manager.cc.o
>> [ 31%] Linking CXX static library ../../../lib/libmaster.a
>> [ 31%] Built target master
>> [ 31%] Built target krb5_realm_override
>> Scanning dependencies of target kudu-master
>> [ 32%] Building CXX object src/kudu/master/CMakeFiles/kud
>> u-master.dir/master_main.cc.o
>> [ 32%] Linking CXX executable ../../../bin/kudu-master
>> /usr/lib64/libkrb5.so: undefined reference to
>> `krb5_enctype_to_name@k5crypto_3_MIT'
>> /usr/lib64/libkrb5.so: undefined reference to
>> `k5_buf_free@krb5support_0_MIT'
>> /usr/lib64/libkrb5.so: undefined reference to
>> `krb5int_utf8_to_ucs4@krb5support_0_MIT'
>> ...
>> ```
>>
>> BTW, I'm building from source code so that I use `kudu fs check`. Can I
>> use "fs check" command if I checkout master branch?
>>
>> Versions
>> ---
>>
>> $ rpm -qf /usr/lib64/libkrb5.so
>> krb5-devel-1.14.1-27.el7_3.x86_64 <= Newest version for CentOS 7.2 yum
>> repos.
>>
>> $ g++ --version
>> g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
>> Copyright (C) 2015 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>>
>>
>> Thanks,
>>
>> Jason.
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: How to flush `block_cache_capacity_mb` easily?

2017-04-17 Thread Jason Heo
Hi, Todd.

I've temporarily pushed this patch to my repository.

https://github.com/jason-heo/kudu/commit/aff1fe181541671d2dc192ad9cb4ed2172a51826

Could you please check I'm on right track?

It will take more time until pushing to cloudera's gerrit because I have
yet to test if my modification works well and I'm not familiar with the
contributing process <https://kudu.apache.org/docs/contributing.html>.

Thanks,

Jason

2017-04-11 12:55 GMT+09:00 Todd Lipcon <t...@cloudera.com>:

> Sure. Here's a high-level overview of the approach:
>
> - in src/kudu/util/cache.h, you'll need to add a new method like
> 'ClearCache'. In cache.cc and nvm_cache.cc you'll need to implement the
> method. You could implement it for the NVM cache to just return
> Status::NotSupported() if your main concern is the default (DRAM) cache.
> - in tserver_service.proto, add a new RPC method called 'ClearCache'
> - in tserver.proto, define its request/response protobufs. They can
> probably be empty
> - in tablet_service.h, tablet_service.cc implement the new method. It can
> call through to BlockCache::GetInstance()->ClearCache() and then
> RespondSuccess
> - in tablet_server-test.cc add a test case which exercises this path
>
> Hope that helps
>
> -Todd
>
> On Mon, Apr 10, 2017 at 6:14 PM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> Great. I would be appreciated it if you guide me how can I contribute it.
>> Then I'll try in my spare time.
>>
>> 2017-04-11 7:46 GMT+09:00 Todd Lipcon <t...@cloudera.com>:
>>
>>> On Sun, Apr 9, 2017 at 6:38 PM, Jason Heo <jason.heo@gmail.com>
>>> wrote:
>>>
>>>> Hi Todd.
>>>>
>>>> I hope you had a good weekend.
>>>>
>>>> Exactly, I'm testing the latency of cold-cache reads from SATA disks
>>>> and performance of difference schema designs as well.
>>>>
>>>> We currently using Elasticsearch for a analytic service. ES has a
>>>> "clear cache API" feature, it makes me easy to test.
>>>>
>>>>
>>> Makes sense. I don't think it would be particularly difficult to add
>>> such an API. Any interest in contributing a patch? I'm happy to point you
>>> in the right direction, if so.
>>>
>>> -Todd
>>>
>>>
>>>> 2017-04-08 5:05 GMT+09:00 Todd Lipcon <t...@cloudera.com>:
>>>>
>>>>> Hey Jason,
>>>>>
>>>>> Can I ask what the purposes of the testing is?
>>>>>
>>>>> One thing to note is that we're currently leaving a fair bit of
>>>>> performance on the table for cold-cache reads from spinning disks. So, if
>>>>> you find that the performance is not satisfactory, it's worth being aware
>>>>> that we will likely make some significant improvements in this area in the
>>>>> future.
>>>>>
>>>>> https://issues.apache.org/jira/browse/KUDU-1289 has some details.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Fri, Apr 7, 2017 at 8:44 AM, Dan Burkert <danburk...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Jason,
>>>>>>
>>>>>> There is no command to have Kudu evict its block cache, but
>>>>>> restarting the tablet server process will have that effect.  Ideally all
>>>>>> written data will be flushed before the restart, otherwise
>>>>>> startup/bootstrap will take a bit longer. Flushing typically happens 
>>>>>> within
>>>>>> 60s of the last write.  Waiting for flush and compaction is also a
>>>>>> best-practice for read-only benchmarks.  I'm not sure if someone else on
>>>>>> the list has an easier way of determining when a flush happens, but I
>>>>>> typically look at the 'MemRowSet' memory usage for the tablet on the
>>>>>> /mem-trackers HTTP endpoint; it should show something minimal like 256B 
>>>>>> if
>>>>>> it's fully flushed and empty.  You can also see details about how much
>>>>>> memory is in the block cache on that page, if that interests you.
>>>>>>
>>>>>> - Dan
>>>>>>
>>>>>> On Thu, Apr 6, 2017 at 11:23 PM, Jason Heo <jason.heo@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi.
>>>>>>>
>>>>>>> I'm using Apache Kudu 1.2 on CDH 5.10.
>>>>>>>
>>>>>>> Currently, I'm doing a performance test of Kudu.
>>>>>>>
>>>>>>> Flushing OS Page Cache is easy, but I don't know how to flush
>>>>>>> `block_cache_capacity_mb` easily.
>>>>>>>
>>>>>>> I currently execute SELECT statement over a unnecessarily table to
>>>>>>> evict cached block of testing table.
>>>>>>>
>>>>>>> It is cumbersome, so I'd like to know is there a command for
>>>>>>> flushing block caches (or another kudu's caches which I don't know yet)
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jason
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: Question about redistributing tablets on failure of a tserver.

2017-04-14 Thread Jason Heo
@Dan

I monitored with `kudu ksck` while re-replication is occurring, but I'm not
sure if this output means my cluster has a problem. (It seems just
indicating one tserver stopped)

Would you please check it?

Thank,

Jason

```
...
...
Tablet 0e29XXX1e1e3168a4d81 of table 'impala::tbl1' is
under-replicated: 1 replica(s) not RUNNING
  a7ca07f9bXXXbbb21cfb (hostname.com:7050): RUNNING
  a97644XXXdb074d4380f (hostname.com:7050): RUNNING [LEADER]
  401b6XXX5feda1de212b (hostname.com:7050): missing

Tablet 550XXX08f5fc94126927 of table 'impala::tbl1' is
under-replicated: 1 replica(s) not RUNNING
  aec55b4XXXdb469427cf (hostname.com:7050): RUNNING [LEADER]
  a7ca07f9b3d94XXX1cfb (hostname.com:7050): RUNNING
  31461XXX3dbe060807a6 (hostname.com:7050): bad state
State:   NOT_STARTED
Data state:  TABLET_DATA_READY
Last status: Tablet initializing...

Tablet 4a1490fcXXX7a2c637e3 of table 'impala::tbl1' is
under-replicated: 1 replica(s) not RUNNING
  a7ca07f9b3d94414XXXb (hostname.com:7050): RUNNING
  40XXXd5b5feda1de212b (hostname.com:7050): RUNNING [LEADER]
  aec55b4e2acXXX9427cf (hostname.com:7050): bad state
State:   NOT_STARTED
Data state:  TABLET_DATA_COPYING
Last status: TabletCopy: Downloading block 05162382 (277/581)
...
...
==
Errors:
==
table consistency check error: Corruption: 52 table(s) are bad

FAILED
Runtime error: ksck discovered errors
```



2017-04-13 3:47 GMT+09:00 Dan Burkert <danburk...@apache.org>:

> Hi Jason, answers inline:
>
> On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>>
>> Q1. Can I disable redistributing tablets on failure of a tserver? The
>> reason why I need this is described in Background.
>>
>
> We don't have any kind of built-in maintenance mode that would prevent
> this, but it can be achieved by setting a flag on each of the tablet
> servers.  The goal is not to disable re-replicating tablets, but instead to
> avoid kicking the failed replica out of the tablet groups to begin with.
> There is a config flag to control exactly that: 'evict_failed_followers'.
> This isn't considered a stable or supported flag, but it should have the
> effect you are looking for, if you set it to false on each of the tablet
> servers, by running:
>
> kudu tserver set-flag  evict_failed_followers false
> --force
>
> for each tablet server.  When you are done, set it back to the default
> 'true' value.  This isn't something we routinely test (especially setting
> it without restarting the server), so please test before trying this on a
> production cluster.
>
> Q2. redistribution goes on even if the failed tserver reconnected to
>> cluster. In my test cluster, it took 2 hours to distribute when a tserver
>> which has 3TB data was killed.
>>
>
> This seems slow.  What's the speed of your network?  How many nodes?  How
> many tablet replicas were on the failed tserver, and were the replica sizes
> evenly balanced?  Next time this happens, you might try monitoring with
> 'kudu ksck' to ensure there aren't additional problems in the cluster (admin 
> guide
> on the ksck tool
> <https://github.com/apache/kudu/blob/master/docs/administration.adoc#ksck>
> ).
>
>
>> Q3. `--follower_unavailable_considered_failed_sec` can be changed
>> without restarting cluster?
>>
>
> The flag can be changed, but it comes with the same caveats as above:
>
> 'kudu tserver set-flag  
> follower_unavailable_considered_failed_sec
> 900 --force'
>
>
> - Dan
>
>


Building from Source fails on my CentOS 7.2

2017-04-13 Thread Jason Heo
Hello.

I'm using CentOS 7.2

To build from Source Code, I followed the manual
 (except
for Re Hat Developer Toolset because I use CentOS 7.2)

Though I failed to compile :(

```
...
[ 31%] Building CXX object
src/kudu/master/CMakeFiles/master.dir/sys_catalog.cc.o
[ 31%] Building CXX object
src/kudu/master/CMakeFiles/master.dir/ts_descriptor.cc.o
[ 31%] Building CXX object
src/kudu/master/CMakeFiles/master.dir/ts_manager.cc.o
[ 31%] Linking CXX static library ../../../lib/libmaster.a
[ 31%] Built target master
[ 31%] Built target krb5_realm_override
Scanning dependencies of target kudu-master
[ 32%] Building CXX object
src/kudu/master/CMakeFiles/kudu-master.dir/master_main.cc.o
[ 32%] Linking CXX executable ../../../bin/kudu-master
/usr/lib64/libkrb5.so: undefined reference to
`krb5_enctype_to_name@k5crypto_3_MIT'
/usr/lib64/libkrb5.so: undefined reference to `k5_buf_free@krb5support_0_MIT
'
/usr/lib64/libkrb5.so: undefined reference to
`krb5int_utf8_to_ucs4@krb5support_0_MIT'
...
```

BTW, I'm building from source code so that I use `kudu fs check`. Can I use
"fs check" command if I checkout master branch?

Versions
---

$ rpm -qf /usr/lib64/libkrb5.so
krb5-devel-1.14.1-27.el7_3.x86_64 <= Newest version for CentOS 7.2 yum
repos.

$ g++ --version
g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Thanks,

Jason.


Re: Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Jason Heo
Hi Dan.

Thank you for your kind reply.

My Kudu runs on CentOS 7.2 with xfs.

I'll try `kudu fs check`.

Thanks,

Jason

2017-04-13 5:47 GMT+09:00 Dan Burkert <danburk...@apache.org>:

> Adar has told me it's fine to run the new 'kudu fs check' tool against a
> Kudu 1.2 server.  It will require building locally, though.
>
> - Dan
>
> On Wed, Apr 12, 2017 at 10:59 AM, Dan Burkert <danburk...@apache.org>
> wrote:
>
>> Hi Jason,
>>
>> First question: what filesystem and OS are you running?
>>
>> This has been an ongoing area of work; we fixed a few major issues in
>> 1.2, and a few more major issues in 1.3, and have a new tool ('kudu fs
>> check') that will be released in 1.4 to diagnose and fix further issues.
>> In some cases we are underestimating the true size of the data, and in some
>> cases we are keeping around data that could be cleaned up.  I've included a
>> list of relevant JIRAs below if you are interested in specifics.  It should
>> be possible to get early access to the 'kudu fs check' tool by compiling
>> Kudu locally, but I'm going to defer to Adar on that, since he's the
>> resident expert on the subject.
>>
>> KUDU-1755 <https://issues.apache.org/jira/browse/KUDU-1755>
>> KUDU-1853 <https://issues.apache.org/jira/browse/KUDU-1853>
>> KUDU-1856 <https://issues.apache.org/jira/browse/KUDU-1856>
>> KUDU-1769 <https://issues.apache.org/jira/browse/KUDU-1769>
>>
>>
>>
>>
>> On Wed, Apr 12, 2017 at 5:02 AM, Jason Heo <jason.heo@gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>> I'm using Apache Kudu 1.2 on CDH 1.2.
>>>
>>> I'm estimating how many servers needed to store my data.
>>>
>>> After loading my test data sets, total_kudu_on_disk_size_
>>> across_kudu_replicas in chart library at CDH is 27.9TB whereas sum of `du
>>> -sh /path/to/tablet_data/data` on each node is 39.9TB which is 43%
>>> bigger than chart library.
>>>
>>> I also observed the same difference on my another Kudu test cluster.
>>>
>>> I'm curious this is normal and wanted to know there is a way to reduce
>>> physical file size.
>>>
>>> Thanks,
>>>
>>> Jason.
>>>
>>>
>>>
>>>
>>>
>>
>


Question about redistributing tablets on failure of a tserver.

2017-04-12 Thread Jason Heo
Hello.

I'm using Apache Kudu 1.2 on CDH 5.10.

Background
---

I'm currently using Elasticsearch to serve web analytic service.
Elasticsearch is very easy to manage cluster. One nice feature of ES is
that I can disable allocation of shard (shard is similar to tablet of Kudu)
intentionally so that I can restart one of physical servers without
rebalancing shards. Or entire cluster can be rolling restarted.

I hope Apache Kudu has similar features.

Question
---

I found that if a tserver doesn't respond for 5 minutes, it starts
replicating tablets. And "5 min" can be configured. So far so good.

Q1. Can I disable redistributing tablets on failure of a tserver? The
reason why I need this is described in Background.
Q2. redistribution goes on even if the failed tserver reconnected to
cluster. In my test cluster, it took 2 hours to distribute when a tserver
which has 3TB data was killed.
Q3. `--follower_unavailable_considered_failed_sec` can be changed without
restarting cluster?

I think this email is somewhat unclear. Please ask me again to clarify if
you could not understand me.

Thanks,

Jason.


Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Jason Heo
Hello.

I'm using Apache Kudu 1.2 on CDH 1.2.

I'm estimating how many servers needed to store my data.

After loading my test data sets,
total_kudu_on_disk_size_across_kudu_replicas in chart library at CDH is
27.9TB whereas sum of `du -sh /path/to/tablet_data/data` on each node is
39.9TB which is 43% bigger than chart library.

I also observed the same difference on my another Kudu test cluster.

I'm curious this is normal and wanted to know there is a way to reduce
physical file size.

Thanks,

Jason.


Re: How to flush `block_cache_capacity_mb` easily?

2017-04-10 Thread Jason Heo
Great. I would be appreciated it if you guide me how can I contribute it.
Then I'll try in my spare time.

2017-04-11 7:46 GMT+09:00 Todd Lipcon <t...@cloudera.com>:

> On Sun, Apr 9, 2017 at 6:38 PM, Jason Heo <jason.heo@gmail.com> wrote:
>
>> Hi Todd.
>>
>> I hope you had a good weekend.
>>
>> Exactly, I'm testing the latency of cold-cache reads from SATA disks and
>> performance of difference schema designs as well.
>>
>> We currently using Elasticsearch for a analytic service. ES has a "clear
>> cache API" feature, it makes me easy to test.
>>
>>
> Makes sense. I don't think it would be particularly difficult to add such
> an API. Any interest in contributing a patch? I'm happy to point you in the
> right direction, if so.
>
> -Todd
>
>
>> 2017-04-08 5:05 GMT+09:00 Todd Lipcon <t...@cloudera.com>:
>>
>>> Hey Jason,
>>>
>>> Can I ask what the purposes of the testing is?
>>>
>>> One thing to note is that we're currently leaving a fair bit of
>>> performance on the table for cold-cache reads from spinning disks. So, if
>>> you find that the performance is not satisfactory, it's worth being aware
>>> that we will likely make some significant improvements in this area in the
>>> future.
>>>
>>> https://issues.apache.org/jira/browse/KUDU-1289 has some details.
>>>
>>> -Todd
>>>
>>> On Fri, Apr 7, 2017 at 8:44 AM, Dan Burkert <danburk...@apache.org>
>>> wrote:
>>>
>>>> Hi Jason,
>>>>
>>>> There is no command to have Kudu evict its block cache, but restarting
>>>> the tablet server process will have that effect.  Ideally all written data
>>>> will be flushed before the restart, otherwise startup/bootstrap will take a
>>>> bit longer. Flushing typically happens within 60s of the last write.
>>>> Waiting for flush and compaction is also a best-practice for read-only
>>>> benchmarks.  I'm not sure if someone else on the list has an easier way of
>>>> determining when a flush happens, but I typically look at the 'MemRowSet'
>>>> memory usage for the tablet on the /mem-trackers HTTP endpoint; it should
>>>> show something minimal like 256B if it's fully flushed and empty.  You can
>>>> also see details about how much memory is in the block cache on that page,
>>>> if that interests you.
>>>>
>>>> - Dan
>>>>
>>>> On Thu, Apr 6, 2017 at 11:23 PM, Jason Heo <jason.heo@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi.
>>>>>
>>>>> I'm using Apache Kudu 1.2 on CDH 5.10.
>>>>>
>>>>> Currently, I'm doing a performance test of Kudu.
>>>>>
>>>>> Flushing OS Page Cache is easy, but I don't know how to flush
>>>>> `block_cache_capacity_mb` easily.
>>>>>
>>>>> I currently execute SELECT statement over a unnecessarily table to
>>>>> evict cached block of testing table.
>>>>>
>>>>> It is cumbersome, so I'd like to know is there a command for flushing
>>>>> block caches (or another kudu's caches which I don't know yet)
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Regards,
>>>>> Jason
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


How to calculate the optimal value of `maintenance_manager_num_threads`

2017-03-24 Thread Jason Heo
Hi,

I'm using Apache Kudu 1.2 on CDH 5.10.

Recently, after reading "Bulk write performance improvements for Kudu 1.4
"
I've noticed that `maintenance_manager_num_threads` is 4 for the 5 spinning
disks.

In my cluster, each node has 10 SATA disks with RAID 1+0 (WAL and Data
directory located in the same partition). As Todd suggested, bulk loading
is doing in PK sorted manner. I think CPU usage and System Load of my
cluster is not high at this moment, so I think it could be increased a
little bit more.

Would someone please suggest the number of my environment?

Thanks in advanced.


Re: What does RowSet Compaction Duration means?

2017-03-14 Thread Jason Heo
Hi Alexey.

Thank you for your reply.

With your help, now I can understand what 'compact_rs_duration` means. But
the `default_num_replicas` is just 3 not 5 :(

It seems compaction on tableB affects huge on bulk loading on tableA. Is
there a way to minimize compaction activities? (something like changing
configuration of Kudu)

The FAQ says that "Since compactions are so predictable, the only tuning
knob available is the number of threads dedicated to flushes and
compactions in the *maintenance manager*."

my `maintenance_manager_num_threads` is already 1.

Thanks.

2017-03-15 3:48 GMT+09:00 Alexey Serbin <aser...@cloudera.com>:

> Hi Jason,
>
> As I understand, that 'milliseconds / second' cryptic unit means 'number
> of units / for sampling (or averaging) interval'.
>
> I.e., they capture that metric reading (expressed in milliseconds) every
> second, subtract previous value from the current value, and declare the
> result as the result measurement at current time.  If not capturing every
> second, then it's about measuring every X seconds, do the subtraction of
> the previous from the current measurement, and then divide by X.
>
> For a single tablet, the 'compact_rs_duration' metric stands for 'Time
> spent compacting RowSets'.  As I understand, that 'total_kudu_compact_rs_
> duration_sum_rate_across_kudu_replicas' is sum/accumulation of those
> measurements for all existing replicas of the specified tablet across Kudu
> cluster.
>
> I suspect you have the replication factor of 5 for that tablet, and at
> some point all replicas become busy with rowset compaction all the time.
>
> Compactions on tables are run in the background.  Compactions on different
> tables run independently.  So, if you have some other activity doing
> inserts/updates on tableB, then it's natural to see compaction happen on
> tabletB as well.
>
>
> Best regards,
>
> Alexey
>
> On Tue, Mar 14, 2017 at 12:50 AM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> Hi.
>>
>> I'm stuck with performance degradation on compaction happens.
>>
>> My Duration is "4956.71 milliseconds / second" What does this mean? I
>> can't figure it out.
>>
>> Here is the captured image: http://imgur.com/WU9sRRq
>>
>> When I'm doing bulk indexing on tableA, sometimes compaction happens over
>> tableB. Is this situation is natural?
>>
>> Thanks.
>>
>
>


Re: Load is high on the Kudu dedicated node.

2017-03-14 Thread Jason Heo
Hi Todd.

Thank you for your kind reply.

I'll try your recommendation and look forward for next releases.

Thanks.

2017-03-15 0:26 GMT+09:00 Todd Lipcon <t...@cloudera.com>:

> Hi Jason,
>
> By "bulk indexing only" you mean you are loading data with a high rate of
> inserts?
>
> It seems that there is a lot of contention on the memory trackers.
> https://issues.apache.org/jira/browse/KUDU-1502 is one JIRA where I noted
> this was the case. If that's the culprit, I would look into the following:
>
> - try to change your insert pattern so that it is more sequential in
> nature (random inserts will cause a lot of block cache lookups to check for
> duplicate keys)
> - if you have RAM available, increase both the block cache capacity and
> the server's memory limit accordingly, so that the bloom lookups will hit
> Kudu's cache instead of having to go to the operating system cache.
>
> Aside from that, we'll be spending some time on improving performance of
> write-heavy workloads in upcoming releases, and I think fixing this
> MemTracker contention will be one of the issues tackled.
>
> In case the above isn't the issue, do you think you could use 'perf record
> -g -a' and generate a flame graph? http://www.
> brendangregg.com/FlameGraphs/cpuflamegraphs.html
>
> -Todd
>
> On Tue, Mar 14, 2017 at 6:14 AM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> Hi. I'm experiencing high load and high cpu usage. Kudu is running on 5
>> kudu dedicated nodes. 2 nodes' load is 40, while 3 nodes' load is 15.
>>
>> Here is the output of `perf record -a & perf report` during bulk indexing
>> only operation.
>>
>> http://imgur.com/8lz1CRk
>>
>> I'm wondering this is a reasonable situation.
>>
>> I'm using Kudu on CDH 5.10
>>
>> Thanks.
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: AUTO_FLUSH_BACKGROUND is supported in Impala 2.7?

2017-03-14 Thread Jason Heo
Hi Jean!

Thank you for your reply.

It helped me a lot.

Thanks.

2017-03-15 0:07 GMT+09:00 Jean-Daniel Cryans <jdcry...@apache.org>:

> You can see here that it's hardcoded to use it: https://github.com/apache/
> kudu/blob/master/java/kudu-spark/src/main/scala/org/
> apache/kudu/spark/kudu/KuduContext.scala#L212
>
> J-D
>
> On Tue, Mar 14, 2017 at 3:38 AM, Jason Heo <jason.heo@gmail.com>
> wrote:
>
>> @Harsh
>>
>> Can I ask one more question?
>>
>> I wanted to know if AUTO_FLUSH_BACKGROUND can be enabled or already
>> enabled by default with Apache Spark.
>>
>> Thanks.
>>
>> 2017-03-14 19:24 GMT+09:00 Jason Heo <jason.heo@gmail.com>:
>>
>>> @Harsh
>>>
>>> Ok. I got it.
>>>
>>> Thanks.
>>>
>>> 2017-03-14 17:41 GMT+09:00 Harsh J <ha...@cloudera.com>:
>>>
>>>> The Apache Impala (incubating) 2.8.0 sources do carry the default flush
>>>> mode of Kudu client sessions as AUTO_FLUSH_BACKGROUND: https:/
>>>> /github.com/apache/incubator-impala/blob/2.8.0/be/src/exec/k
>>>> udu-table-sink.cc#L161-L162 (in 2.7.0 this was MANUAL)
>>>>
>>>> The CDH 5.10.0 included Apache Impala (incubating) release too has the
>>>> default flush mode set to AUTO_FLUSH_BACKGROUND as evidenced at
>>>> https://github.com/cloudera/Impala/blob/cdh5.10.0-release
>>>> /be/src/exec/kudu-table-sink.cc#L161-L162.
>>>>
>>>> On Tue, 14 Mar 2017 at 12:34 Jason Heo <jason.heo@gmail.com> wrote:
>>>>
>>>>> Sorry.
>>>>>
>>>>> I've noticed that `v2.7.0` is just Impala Shell's.
>>>>>
>>>>> It seems CDH 5.10.x includes Impala 2.8. But I can't find if
>>>>> AUTO_FLUSH_BACKGROUND is added at the "New Features in Impala 2.8.x /
>>>>> CDH 5.10.x
>>>>> <https://www.cloudera.com/documentation/enterprise/release-notes/topics/impala_new_features.html#new_features_280>
>>>>> "
>>>>>
>>>>>
>>>>>
>>>>> 2017-03-14 15:47 GMT+09:00 Jason Heo <jason.heo@gmail.com>:
>>>>>
>>>>> Hi.
>>>>>
>>>>> I'm using (Impala + Kudu) on CDH 5.10
>>>>>
>>>>>
>>>>> > version;
>>>>> Shell version: Impala Shell v2.7.0-cdh5.10.0 (785a073) built on Fri
>>>>> Jan 20 12:03:56 PST 2017
>>>>> Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build
>>>>> 785a073cd07e2540d521ecebb8b38161ccbd2aa2)
>>>>>
>>>>> I've read IMPALA-4134
>>>>> <https://issues-test.apache.org/jira/browse/IMPALA-4134>, I'm
>>>>> wondering if AUTO_FLUSH_BACKGROUND is enabled by default impala 2.7
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>
>>
>


What does RowSet Compaction Duration means?

2017-03-14 Thread Jason Heo
Hi.

I'm stuck with performance degradation on compaction happens.

My Duration is "4956.71 milliseconds / second" What does this mean? I can't
figure it out.

Here is the captured image: http://imgur.com/WU9sRRq

When I'm doing bulk indexing on tableA, sometimes compaction happens over
tableB. Is this situation is natural?

Thanks.


Re: AUTO_FLUSH_BACKGROUND is supported in Impala 2.7?

2017-03-14 Thread Jason Heo
Sorry.

I've noticed that `v2.7.0` is just Impala Shell's.

It seems CDH 5.10.x includes Impala 2.8. But I can't find if
AUTO_FLUSH_BACKGROUND is added at the "New Features in Impala 2.8.x / CDH
5.10.x
<https://www.cloudera.com/documentation/enterprise/release-notes/topics/impala_new_features.html#new_features_280>
"



2017-03-14 15:47 GMT+09:00 Jason Heo <jason.heo@gmail.com>:

> Hi.
>
> I'm using (Impala + Kudu) on CDH 5.10
>
>
> > version;
> Shell version: Impala Shell v2.7.0-cdh5.10.0 (785a073) built on Fri Jan 20
> 12:03:56 PST 2017
> Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build
> 785a073cd07e2540d521ecebb8b38161ccbd2aa2)
>
> I've read IMPALA-4134
> <https://issues-test.apache.org/jira/browse/IMPALA-4134>, I'm wondering
> if AUTO_FLUSH_BACKGROUND is enabled by default impala 2.7
>
> Thanks.
>


AUTO_FLUSH_BACKGROUND is supported in Impala 2.7?

2017-03-14 Thread Jason Heo
Hi.

I'm using (Impala + Kudu) on CDH 5.10


> version;
Shell version: Impala Shell v2.7.0-cdh5.10.0 (785a073) built on Fri Jan 20
12:03:56 PST 2017
Server version: impalad version 2.7.0-cdh5.10.0 RELEASE (build
785a073cd07e2540d521ecebb8b38161ccbd2aa2)

I've read IMPALA-4134
, I'm wondering if
AUTO_FLUSH_BACKGROUND is enabled by default impala 2.7

Thanks.


Re: Apache Kudu Table is 6.6 times larger than Parquet File.

2017-03-13 Thread Jason Heo
Hi, Janne

As I mentioned, I'm using CDH 5.10. I checked it using Cloudera Manager at
"Kudu -> Chart Library"

I'm not sure there is another way.

Thanks.

2017-03-13 17:46 GMT+09:00 Janne Keskitalo :

> Hi
>
> How do you check the physical size of a kudu table?
>
> ​
>