Re: Improving system design logging in spark

2016-04-20 Thread Takeshi Yamamuro
Hi, As for #1 and #2, seems it is hard to catch remote/local fetching time because they are overlapped with each other: See `ShuffleBlockFetcherIterator`. IMO the current metric there (catching block time to fetch data from a queue) is kind of enough for most of users because remote fetching

Re: Organizing Spark ML example packages

2016-04-20 Thread Joseph Bradley
Sounds good to me. I'd request we be strict during this process about requiring *no* changes to the example itself, which will make review easier. On Tue, Apr 19, 2016 at 11:12 AM, Bryan Cutler wrote: > +1, adding some organization would make it easier for people to find a >

Re: Improving system design logging in spark

2016-04-20 Thread Ted Yu
Interesting. For #3: bq. reading data from, I guess you meant reading from disk. On Wed, Apr 20, 2016 at 10:45 AM, atootoonchian wrote: > Current spark logging mechanism can be improved by adding the following > parameters. It will help in understanding system bottlenecks and

Improving system design logging in spark

2016-04-20 Thread atootoonchian
Current spark logging mechanism can be improved by adding the following parameters. It will help in understanding system bottlenecks and provide useful guidelines for Spark application developer to design an optimized application. 1. Shuffle Read Local Time: Time for a task to read shuffle data

回复:回复:Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
I found spark sql lost precision, and handle data as int with some rule. Following is data got via hive shell and spark sql, with same sql to same hive table: Hive: 0.4 0.5 1.8 0.4 0.49 1.5 Spark sql: 1 2 2 Seems the handle rule is: when decimal point data <0.5 then to 0, when decimal point

回复:Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
The output is: Spark SQ:6828127 Hive:6980574.1269 发自 网易邮箱大师 在2016年04月20日 18:06,FangFang Chen 写道: Hi all, Please give some suggestions. Thanks With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns. Select sum(column) from table; column is

Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
Hi all, Please give some suggestions. Thanks With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns. Select sum(column) from table; column is defined as decimal(38,18). Spark version:1.5.3 Hive version:2.0.0 发自 网易邮箱大师

Re: RFC: Remote "HBaseTest" from examples?

2016-04-20 Thread Saisai Shao
+1, HBaseTest in Spark Example is quite old and obsolete, the HBase connector in HBase repo has evolved a lot, it would be better to guide user to refer to that not here in Spark example. So good to remove it. Thanks Saisai On Wed, Apr 20, 2016 at 1:41 AM, Josh Rosen