Re: [Discussion] Blocklet DataMap caching in driver

2018-06-22 Thread kanaka kumar avvaru
Hi Manish,

Thanks for proposing configured columns for min max cache. This will help
customers who has large data but only few columns are used for filter
condition.
+1 for the solution 1.


Regards,
Kanaka

On Fri, Jun 22, 2018 at 11:39 AM, manishgupta88 
wrote:

> Thanks Ravi for the feedback. I completely agree with you that we need to
> develop the second solution ASAP. Please find my response below for your
> queries.
>
> 1. what if the query comes on noncached columns, will it start read from
> disk in driver side for minmax ?
> - If query is on a non-cached column then all the blocks will be selected
> and min/max pruning will be done in each executor. In driver side there
> will
> not be any read as it is a single process and it will increase the pruning
> time if for every query min/max values are read from disk. So I feel it is
> better to read in distributed way using the executors.
>
> 2. Are we planning to cache blocklet level information or block level
> information in driver side for cached columns?
> - We will provide an option to user to cache at Block or Blocklet level. It
> will be configurable at table level and default caching will be at Block
> level. I will cover this part in detail in the design document.
>
> 3. What is the impact if we automatically chose cached columns from the
> user query instead of letting the user configure them?
> - Every query can have different filter columns. So if we choose
> automatically then for every different column it will read from disk and
> load into cache. This can be more cumbersome and query time can vary
> unexpectedly which may not be justifiable. So I feel it is better to let
> user to decide which columns to be cached.
>
> Let me know for any more clarifications.
>
> Regards
> Manish Gupta
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>


Re: query carbondata by presto got error : tableCacheModel.carbonTableshould not be null

2018-06-22 Thread 陈星宇
hi,


on cluster. see error log as below:


2018-06-22T15:49:31.577+0800INFOquery-execution-4234
com.facebook.presto.event.query.QueryMonitorTIMELINE: Query 
20180622_074931_07271_ehasj :: 
Transaction:[3f6d5c62-045c-4638-b2dd-4025dddb2550] :: elapsed 12ms :: planning 
3ms :: scheduling 4ms :: running 4ms :: finishing 1ms :: begin 
2018-06-22T15:49:31.561+08:00 :: end 2018-06-22T15:49:31.573+08:00
2018-06-22T15:49:33.926+0800ERROR   remote-task-callback-7148   
com.facebook.presto.execution.StageStateMachine Stage 
20180622_074933_07272_ehasj.1 failed
java.lang.NullPointerException: tableCacheModel.carbonTable should not be null
at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:787)
at 
org.apache.carbondata.presto.CarbondataRecordSetProvider.getRecordSet(CarbondataRecordSetProvider.java:84)
at 
org.apache.carbondata.presto.CarbondataPageSourceProvider.createPageSource(CarbondataPageSourceProvider.java:48)
at 
com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)
at 
com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
at 
com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:259)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:369)
at 
com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:266)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:646)
at com.facebook.presto.operator.Driver.processFor(Driver.java:260)
at 
com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
at 
com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at 
com.facebook.presto.execution.executor.LegacyPrioritizedSplitRunner.process(LegacyPrioritizedSplitRunner.java:23)
at 
com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:492)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)




2018-06-22T15:49:34.073+0800INFOquery-execution-4230
com.facebook.presto.event.query.QueryMonitorTIMELINE: Query 
20180622_074933_07272_ehasj :: 
Transaction:[f6e244f4-5121-4e64-9606-2c994263460f] :: elapsed 206ms :: planning 
195ms :: scheduling 4ms :: running 6ms :: finishing 1ms :: begin 
2018-06-22T15:49:33.720+08:00 :: end 2018-06-22T15:49:33.926+08:00

 


chenxingyu
-- Original --
From:  "Bhavya Aggarwal";
Date:  Fri, Jun 22, 2018 04:02 PM
To:  "dev"; 

Subject:  Re: query carbondata by presto got error : 
tableCacheModel.carbonTableshould not be null

 
Hi,
Are you running it on local or cluster, it should not come but if you will
send the stack trace we can resolve it.

Regards
Bhavya

On Fri, Jun 22, 2018 at 12:51 PM, 陈星宇  wrote:

> hi ,
> i query carbondata by presto, but got error : tableCacheModel.carbonTable
> should not be null
> any idea for this issue?
>
>
> chenxingyu




-- 
*Bhavya Aggarwal*
Sr. Director
Knoldus Inc. 
+91-9910483067
Canada - USA - India - Singapore
 
 

Re: query carbondata by presto got error : tableCacheModel.carbonTable should not be null

2018-06-22 Thread Bhavya Aggarwal
Hi,
Are you running it on local or cluster, it should not come but if you will
send the stack trace we can resolve it.

Regards
Bhavya

On Fri, Jun 22, 2018 at 12:51 PM, 陈星宇  wrote:

> hi ,
> i query carbondata by presto, but got error : tableCacheModel.carbonTable
> should not be null
> any idea for this issue?
>
>
> chenxingyu




-- 
*Bhavya Aggarwal*
Sr. Director
Knoldus Inc. 
+91-9910483067
Canada - USA - India - Singapore
 
 


query carbondata by presto got error : tableCacheModel.carbonTable should not be null

2018-06-22 Thread 陈星宇
hi ,
i query carbondata by presto, but got error : tableCacheModel.carbonTable 
should not be null
any idea for this issue?


chenxingyu

Re: S3 support

2018-06-22 Thread David CaiQiang
Hi Kunal,
 I have some questions.

*Problem(Locking):* 
  Does the memory lock support that the multiple drivers concurrently load
data to the same table? maybe it should note this limitation.
   
*Problem(Write with append mode):* 
  1. atomicity
  After the overwrite operation failed, maybe the old file is destroyed. It
should be able to recover the old file.

*Problem(Alter rename):* 
If the table folder is different with the table name, maybe "refresh
table" command should be enhanced. 



-
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Blocklet DataMap caching in driver

2018-06-22 Thread manishgupta88
Thanks Ravi for the feedback. I completely agree with you that we need to
develop the second solution ASAP. Please find my response below for your
queries.

1. what if the query comes on noncached columns, will it start read from 
disk in driver side for minmax ? 
- If query is on a non-cached column then all the blocks will be selected
and min/max pruning will be done in each executor. In driver side there will
not be any read as it is a single process and it will increase the pruning
time if for every query min/max values are read from disk. So I feel it is
better to read in distributed way using the executors.

2. Are we planning to cache blocklet level information or block level 
information in driver side for cached columns? 
- We will provide an option to user to cache at Block or Blocklet level. It
will be configurable at table level and default caching will be at Block
level. I will cover this part in detail in the design document.

3. What is the impact if we automatically chose cached columns from the 
user query instead of letting the user configure them? 
- Every query can have different filter columns. So if we choose
automatically then for every different column it will read from disk and
load into cache. This can be more cumbersome and query time can vary
unexpectedly which may not be justifiable. So I feel it is better to let
user to decide which columns to be cached.

Let me know for any more clarifications.

Regards
Manish Gupta



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/