Re: classnot found exception

2017-05-19 Thread David CaiQiang
please check whether spark-catalyst jar is exists or not in jars folder.



-
Best Regards
David Cai
--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/classnot-found-exception-tp12861p13028.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: Implementing Streaming Ingestion Support in CarbonData

2017-05-19 Thread Aniket Adnaik
 HI Prabhat,

Sorry, I missed your email dated on May 15th.
Regarding streaming feature, we already have some portion of implementation
in place and some other things are being worked on. There is lot of other
Streaming Ingestion functionality that still needs to be implemented.
I have created a Jira 1072 (
https://issues.apache.org/jira/browse/CARBONDATA-1072)  with work breakdown
list.
Your contributions are welcome. Lets sync up on work items so that there is
no conflict of work.

Best Regards,
Aniket

On 15 May 2017 10:46 pm, "prabhatkashyap" 
wrote:

> Hello, everyone
>
> We are going to start implementing streaming ingestion support in
> carbondata. We'll be using Scala and be following document available on dev
> community by Aniket Adnaik
>
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5
> .nabble.com/DISCUSSION-New-Feature-Streaming-Ingestion-into-
> CarbonData-td9724.html
>
> Your suggestions are welcome here.
>
> Regards,
> Prabhat Kashyap
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-m
> ailing-list-archive.1130556.n5.nabble.com/Implementing-Strea
> ming-Ingestion-Support-in-CarbonData-tp12709.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>


[jira] [Created] (CARBONDATA-1072) Streaming Ingestion Feature

2017-05-19 Thread Aniket Adnaik (JIRA)
Aniket Adnaik created CARBONDATA-1072:
-

 Summary: Streaming Ingestion Feature 
 Key: CARBONDATA-1072
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1072
 Project: CarbonData
  Issue Type: New Feature
  Components: core, data-load, data-query, examples, file-format, 
spark-integration, sql
Affects Versions: 1.1.0
Reporter: Aniket Adnaik
 Fix For: 1.2.0


High level break down of work Items/Implementation phases:
Design document will be attached soon.
 
Phase – 1 – Spark Structured Streaming with regular Carbondata Format

This phase will mainly focus on supporting Streaming ingestion using 
Spark Structured streaming 
1.  Write Path Implementation 
   - Integration with Spark’s Structured Streaming framework  
   (FileStreamSink etc)
   - StreamingOutputWriter (StreamingOuputWriterFactory)
   - Prepare Write  (Schema Validation, Segment creation, 
  Streaming file creation etc)
   - StreamingRecordWriter ( Data conversion from Catalyst InternalRow
 to Carbondata compatible format , make use of new load path) 

 2. Read Path Implementation (some overlap with phase-2)
  - Modify getsplits() to read from Streaming Segment
  - Read commited info from meta data to get correct offsets
  - Make use of Min-Max index if available 
  - Use sequential scan - data is unsorted , cannot use Btree index 

3.  Compaction
 -  Minor Compaction
 -  Major Compaction

   4.   Metadata Management
 - Streaming metadata store (e.g. Offsets, timestamps etc.)
   
   5.   Failure Recovery
  - Rollback on failure
  - Handle asynchronous writes to CarbonData (using hflush) 

Phase – 2 : Spark Structured Streaming with Appendable CarbonData format
 1.Streaming File Format
   - Writers use V3 file format for appending Columnar unsorted 
 data blockets
  - Modify Readers to read from appendable streaming file format
-   
Phase -3 : 
 1. Inter-opertability Support
   - Functionality with other features/Components
   - Concurrent queries with streaming ingestion
   - Concurrent operations with Streaming Ingestion (e.g. Compaction, 
  Alter table, Secondary Index etc.
2.  Kafka Connect Ingestion / Carbondata connector
  - Direct ingestion from Kafka Connect without Spark Structured 
Streaming
  - Separate Kafka  Connector to receive data through network port
  - Data commit and Offset management
-
Phase-4 : Support for other streaming engines
 -  Analysis of Streaming APIs/interface  with other streaming engines
 - Implementation of connectors  for different streaming engines storm, 
   flink , flume, etc.

-
Phase -5 : In-memory Streaming table (probable feature)
-
   1.   In-memory Cache for Streaming data 
 -  Fault tolerant  in-memory buffering / checkpoint with WAL
 -  Readers read from in-memory tables if available
 -  Background threads for writing streaming data ,etc.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Aniket Adnaik
Congratulations Ravindra!!

Best Regards,
Aniket

On 19 May 2017 4:26 am, "Liang Chen"  wrote:

> Hi all
>
> We are pleased to announce that the PMC has invited Ravindra as new Apache
> CarbonData PMC member, and the invite has been accepted !
>
> Congrats to Ravindra and welcome aboard.
>
> Thanks
> The Apache CarbonData team
>


Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread David CaiQiang
Congrats Ravindra



-
Best Regards
David Cai
--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ANNOUNCE-Ravindra-as-new-Apache-CarbonData-PMC-tp12985p13020.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Kunal Kapoor
Congratulations ravi

--
Regards

*Kunal Kapoor*

On Fri, May 19, 2017 at 7:55 PM, sounak  wrote:

> Congrats Ravi :)
>
> On Fri, May 19, 2017 at 7:51 PM, Kumar Vishal 
> wrote:
>
> > Congratulations Ravi :)
> > -Regards
> > Kumar Vishal
> >
> > Sent from my iPhone
> >
> > > On 19-May-2017, at 19:02, Jacky Li  wrote:
> > >
> > > Congrats Ravindra :)
> > >
> > >
> > >> 在 2017年5月19日,下午8:36,Gururaj Shetty  写道:
> > >>
> > >> Congratulations Ravindra
> > >>
> > >> Regards,
> > >> Gururaj
> > >>
> > >>> On Fri, May 19, 2017 at 5:31 PM, Bhavya Aggarwal  >
> > wrote:
> > >>>
> > >>> Congrats Ravindra,
> > >>>
> > >>> Regards
> > >>> Bhavya
> > >>>
> > >>> On Fri, May 19, 2017 at 4:56 PM, Liang Chen  >
> > >>> wrote:
> > >>>
> >  Hi all
> > 
> >  We are pleased to announce that the PMC has invited Ravindra as new
> > >>> Apache
> >  CarbonData PMC member, and the invite has been accepted !
> > 
> >  Congrats to Ravindra and welcome aboard.
> > 
> >  Thanks
> >  The Apache CarbonData team
> > 
> > >>>
> > >
> > >
> > >
> >
>
>
>
> --
> Thanks
> Sounak
>


[GitHub] carbondata-site pull request #42: Updated CarbonData logo and Apache Powered...

2017-05-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata-site/pull/42


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata-site pull request #41: Fixed the font issue in the subscribe, un-...

2017-05-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata-site/pull/41


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata-site pull request #42: Updated CarbonData logo and Apache Powered...

2017-05-19 Thread PallaviSingh1992
GitHub user PallaviSingh1992 opened a pull request:

https://github.com/apache/carbondata-site/pull/42

Updated CarbonData logo and Apache Powered by Logo



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PallaviSingh1992/incubator-carbondata-site 
feature/UpdateWithPoweredByLogo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata-site/pull/42.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #42


commit 1c8ca62043a8c060bf0a66e8271689b383dce360
Author: PallaviSingh1992 
Date:   2017-05-19T16:21:15Z

Updated CarbonData logo and Apache Powered by Logo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [Discussion] Minimize the Btree size and unify the driver and executor Btrees.

2017-05-19 Thread Kumar Vishal
Hi Jacky,
Number of index files will be number of node in cluster per segment(if
number of node is 10 then 10 index file per segment). For merging all the
index file to one we need to add some thrift to handle this as we need to
maintain block's footer to task id mapping.

-Regards
Kumar Vishal

On Fri, May 19, 2017 at 7:02 PM, Jacky Li  wrote:

>
> > 在 2017年5月19日,下午8:28,Ravindra Pesala  写道:
> >
> > Hi Vishal & Jacky,
> >
> > Yes, we need to extract interfaces for storing index in memory like
> > onheap/offheap. And same interface can be implemented to store/retrieve
> > index from external service like DB.
> >
> > Not all segments will be kept in single tree here, we try to keep one
> index
> > file to one array. Here one segment may contain multiple indexes so we
> will
> > have multiple arrays per segment.
>
> One more suggest here, we should make number of index file as less as
> possible so that improve IO to load the index, like make one index file for
> one segment only. We can do it by issuing one more mapreduce job after the
> loading carbondata file. The job will read all footers and reduce to
> generate one index file. What do you think?
>
> >
> > Yes, the ratio of blocklets inside blocks are small, so for each task we
> > will send the information of blocklets which needs to scanned inside
> block
> > so the serialization cost is very minimum.
> >
> > Regards,
> > Ravindra
> >
> > On 19 May 2017 at 08:17, Jacky Li  wrote:
> >
> >>
> >>> 在 2017年5月18日,下午3:31,Kumar Vishal  写道:
> >>>
> >>> Hi Ravi,
> >>>
> >>> I have few queries related to both the solution.
> >>>
> >>> 1. To reduce the java heap for Btree , we can remove the Btree data
> >>> structure and use simple single array and do binary search on it. And
> >> also
> >>> we should move this cached arrays to unsafe (offheap/onheap) to reduce
> >> the
> >>> burden on GC.
> >>>
> >>
> >> I think now is the right time to abstract driver side index into an
> >> interface, which can have multiple implementation: onheap, offheap,
> >> external service, etc.
> >>
> >>> Single array of object?? If user is loading small data in each load so
> we
> >>> will save only intermediate node of btree by maintaining in array. Yes
> by
> >>> maintaining all the metadata(startkey, min,max) in offheap will reduce
> >>> memory footprint. Is it possible to maintain data in single byte array
> >> and
> >>> maintaining the offset of each key for each column??
> >>>
> >>
> >> Do you mean put to all indexes for all segments in a single array? In
> that
> >> case, we can compare in-memory sequential scan and binary search which
> >> introduce random walk on memory. I feel sequential scan may beat random
> >> walk if the number of element is not many.
> >>
> >>> 2. Unify the btree to single Btree instead of 2 and load at driver
> side.
> >>> So that only one lookup can be done to find the blocklets directly. And
> >>> executors are not required to load the btree for every query.
> >>>
> >>> If we unify the btree to single btree, then it will be loaded in driver
> >>> side, so driver will need more memory to maintain this metadata and for
> >>> each query driver needs to do blocklet level pruning, in case of
> >> concurrent
> >>> query driver may be overloaded. And how we will send this metadata
> >>> information (offset of each column in file and other metadata) to
> >> executor,
> >>> serializing and deserializing this metadata cost also we need to
> >> consider.
> >>>
> >>
> >> Assuming there is only 4~8 blocklet in block, I think for a block, we
> can
> >> sent blockletId using a byte array. Should be ok, right?
> >>
> >>> Please correct me if my comments are not valid.
> >>>
> >>> -Regards
> >>> Kumar Vishal
> >>>
> >>>
> >>> On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala <
> ravi.pes...@gmail.com
> >>>
> >>> wrote:
> >>>
>  Hi Liang,
> 
>  Yes Liang , it will be done in 2 parts. At first reduce the size of
> the
>  btree and then merge the driver side and executor btree to single
> btree.
> 
>  Regards,
>  Ravindra.
> 
>  On 17 May 2017 at 19:28, Liang Chen  wrote:
> 
> > Hi Ravi
> >
> > Thank you bringing this improvement discussion to mailing list.
> >
> > One question , the point1 how to solve the below issues ? there are
> >> still
> > two part index info in driver and executor side ?
> > 
> > 
> > And also chances of loading btree on each executor is more for every
>  query
> > because there is no guarantee that same block goes to same executor
> >> every
> > time. It will be worse in case of dynamic containers.
> >
> > Regards
> > Liang
> >
> > 2017-05-17 7:33 GMT-04:00 Ravindra Pesala :
> >
> >> Hi,
> >>
> >> *1. Current problem.*
> >> 1.There is more size taking on java heap to create Btree for index
>  file.
> >> It is beca

Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread sounak
Congrats Ravi :)

On Fri, May 19, 2017 at 7:51 PM, Kumar Vishal 
wrote:

> Congratulations Ravi :)
> -Regards
> Kumar Vishal
>
> Sent from my iPhone
>
> > On 19-May-2017, at 19:02, Jacky Li  wrote:
> >
> > Congrats Ravindra :)
> >
> >
> >> 在 2017年5月19日,下午8:36,Gururaj Shetty  写道:
> >>
> >> Congratulations Ravindra
> >>
> >> Regards,
> >> Gururaj
> >>
> >>> On Fri, May 19, 2017 at 5:31 PM, Bhavya Aggarwal 
> wrote:
> >>>
> >>> Congrats Ravindra,
> >>>
> >>> Regards
> >>> Bhavya
> >>>
> >>> On Fri, May 19, 2017 at 4:56 PM, Liang Chen 
> >>> wrote:
> >>>
>  Hi all
> 
>  We are pleased to announce that the PMC has invited Ravindra as new
> >>> Apache
>  CarbonData PMC member, and the invite has been accepted !
> 
>  Congrats to Ravindra and welcome aboard.
> 
>  Thanks
>  The Apache CarbonData team
> 
> >>>
> >
> >
> >
>



-- 
Thanks
Sounak


Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Kumar Vishal
Congratulations Ravi :)
-Regards
Kumar Vishal

Sent from my iPhone

> On 19-May-2017, at 19:02, Jacky Li  wrote:
> 
> Congrats Ravindra :)
> 
> 
>> 在 2017年5月19日,下午8:36,Gururaj Shetty  写道:
>> 
>> Congratulations Ravindra
>> 
>> Regards,
>> Gururaj
>> 
>>> On Fri, May 19, 2017 at 5:31 PM, Bhavya Aggarwal  wrote:
>>> 
>>> Congrats Ravindra,
>>> 
>>> Regards
>>> Bhavya
>>> 
>>> On Fri, May 19, 2017 at 4:56 PM, Liang Chen 
>>> wrote:
>>> 
 Hi all
 
 We are pleased to announce that the PMC has invited Ravindra as new
>>> Apache
 CarbonData PMC member, and the invite has been accepted !
 
 Congrats to Ravindra and welcome aboard.
 
 Thanks
 The Apache CarbonData team
 
>>> 
> 
> 
> 


Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Jacky Li
Congrats Ravindra :)


> 在 2017年5月19日,下午8:36,Gururaj Shetty  写道:
> 
> Congratulations Ravindra
> 
> Regards,
> Gururaj
> 
> On Fri, May 19, 2017 at 5:31 PM, Bhavya Aggarwal  wrote:
> 
>> Congrats Ravindra,
>> 
>> Regards
>> Bhavya
>> 
>> On Fri, May 19, 2017 at 4:56 PM, Liang Chen 
>> wrote:
>> 
>>> Hi all
>>> 
>>> We are pleased to announce that the PMC has invited Ravindra as new
>> Apache
>>> CarbonData PMC member, and the invite has been accepted !
>>> 
>>> Congrats to Ravindra and welcome aboard.
>>> 
>>> Thanks
>>> The Apache CarbonData team
>>> 
>> 





Re: [Discussion] Minimize the Btree size and unify the driver and executor Btrees.

2017-05-19 Thread Jacky Li

> 在 2017年5月19日,下午8:28,Ravindra Pesala  写道:
> 
> Hi Vishal & Jacky,
> 
> Yes, we need to extract interfaces for storing index in memory like
> onheap/offheap. And same interface can be implemented to store/retrieve
> index from external service like DB.
> 
> Not all segments will be kept in single tree here, we try to keep one index
> file to one array. Here one segment may contain multiple indexes so we will
> have multiple arrays per segment.

One more suggest here, we should make number of index file as less as possible 
so that improve IO to load the index, like make one index file for one segment 
only. We can do it by issuing one more mapreduce job after the loading 
carbondata file. The job will read all footers and reduce to generate one index 
file. What do you think?

> 
> Yes, the ratio of blocklets inside blocks are small, so for each task we
> will send the information of blocklets which needs to scanned inside block
> so the serialization cost is very minimum.
> 
> Regards,
> Ravindra
> 
> On 19 May 2017 at 08:17, Jacky Li  wrote:
> 
>> 
>>> 在 2017年5月18日,下午3:31,Kumar Vishal  写道:
>>> 
>>> Hi Ravi,
>>> 
>>> I have few queries related to both the solution.
>>> 
>>> 1. To reduce the java heap for Btree , we can remove the Btree data
>>> structure and use simple single array and do binary search on it. And
>> also
>>> we should move this cached arrays to unsafe (offheap/onheap) to reduce
>> the
>>> burden on GC.
>>> 
>> 
>> I think now is the right time to abstract driver side index into an
>> interface, which can have multiple implementation: onheap, offheap,
>> external service, etc.
>> 
>>> Single array of object?? If user is loading small data in each load so we
>>> will save only intermediate node of btree by maintaining in array. Yes by
>>> maintaining all the metadata(startkey, min,max) in offheap will reduce
>>> memory footprint. Is it possible to maintain data in single byte array
>> and
>>> maintaining the offset of each key for each column??
>>> 
>> 
>> Do you mean put to all indexes for all segments in a single array? In that
>> case, we can compare in-memory sequential scan and binary search which
>> introduce random walk on memory. I feel sequential scan may beat random
>> walk if the number of element is not many.
>> 
>>> 2. Unify the btree to single Btree instead of 2 and load at driver side.
>>> So that only one lookup can be done to find the blocklets directly. And
>>> executors are not required to load the btree for every query.
>>> 
>>> If we unify the btree to single btree, then it will be loaded in driver
>>> side, so driver will need more memory to maintain this metadata and for
>>> each query driver needs to do blocklet level pruning, in case of
>> concurrent
>>> query driver may be overloaded. And how we will send this metadata
>>> information (offset of each column in file and other metadata) to
>> executor,
>>> serializing and deserializing this metadata cost also we need to
>> consider.
>>> 
>> 
>> Assuming there is only 4~8 blocklet in block, I think for a block, we can
>> sent blockletId using a byte array. Should be ok, right?
>> 
>>> Please correct me if my comments are not valid.
>>> 
>>> -Regards
>>> Kumar Vishal
>>> 
>>> 
>>> On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala >> 
>>> wrote:
>>> 
 Hi Liang,
 
 Yes Liang , it will be done in 2 parts. At first reduce the size of the
 btree and then merge the driver side and executor btree to single btree.
 
 Regards,
 Ravindra.
 
 On 17 May 2017 at 19:28, Liang Chen  wrote:
 
> Hi Ravi
> 
> Thank you bringing this improvement discussion to mailing list.
> 
> One question , the point1 how to solve the below issues ? there are
>> still
> two part index info in driver and executor side ?
> 
> 
> And also chances of loading btree on each executor is more for every
 query
> because there is no guarantee that same block goes to same executor
>> every
> time. It will be worse in case of dynamic containers.
> 
> Regards
> Liang
> 
> 2017-05-17 7:33 GMT-04:00 Ravindra Pesala :
> 
>> Hi,
>> 
>> *1. Current problem.*
>> 1.There is more size taking on java heap to create Btree for index
 file.
>> It is because we create multiple objects for each leaf node so it
>> takes
>> more memory inside heap than actual size of index file. while doing
>> LRU
>> cache also we are considering only index file size instead of objects
> size
>> so it impacts the eviction process of LRU cache.
>> 2. Currently we load one btree on driver side to find the blocks and
> load
>> another btree on executor side to find the blocklets. After we have
>> increased the blocklet size to 128 mb and decrease the table_block
>> size
> to
>> 256 mb the number of nodes inside driver side btree

Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Gururaj Shetty
Congratulations Ravindra

Regards,
Gururaj

On Fri, May 19, 2017 at 5:31 PM, Bhavya Aggarwal  wrote:

> Congrats Ravindra,
>
> Regards
> Bhavya
>
> On Fri, May 19, 2017 at 4:56 PM, Liang Chen 
> wrote:
>
> > Hi all
> >
> > We are pleased to announce that the PMC has invited Ravindra as new
> Apache
> > CarbonData PMC member, and the invite has been accepted !
> >
> > Congrats to Ravindra and welcome aboard.
> >
> > Thanks
> > The Apache CarbonData team
> >
>


Re: [Discussion] Minimize the Btree size and unify the driver and executor Btrees.

2017-05-19 Thread Ravindra Pesala
Hi Vishal & Jacky,

Yes, we need to extract interfaces for storing index in memory like
onheap/offheap. And same interface can be implemented to store/retrieve
index from external service like DB.

Not all segments will be kept in single tree here, we try to keep one index
file to one array. Here one segment may contain multiple indexes so we will
have multiple arrays per segment.

Yes, the ratio of blocklets inside blocks are small, so for each task we
will send the information of blocklets which needs to scanned inside block
so the serialization cost is very minimum.

Regards,
Ravindra

On 19 May 2017 at 08:17, Jacky Li  wrote:

>
> > 在 2017年5月18日,下午3:31,Kumar Vishal  写道:
> >
> > Hi Ravi,
> >
> > I have few queries related to both the solution.
> >
> > 1. To reduce the java heap for Btree , we can remove the Btree data
> > structure and use simple single array and do binary search on it. And
> also
> > we should move this cached arrays to unsafe (offheap/onheap) to reduce
> the
> > burden on GC.
> >
>
> I think now is the right time to abstract driver side index into an
> interface, which can have multiple implementation: onheap, offheap,
> external service, etc.
>
> > Single array of object?? If user is loading small data in each load so we
> > will save only intermediate node of btree by maintaining in array. Yes by
> > maintaining all the metadata(startkey, min,max) in offheap will reduce
> > memory footprint. Is it possible to maintain data in single byte array
> and
> > maintaining the offset of each key for each column??
> >
>
> Do you mean put to all indexes for all segments in a single array? In that
> case, we can compare in-memory sequential scan and binary search which
> introduce random walk on memory. I feel sequential scan may beat random
> walk if the number of element is not many.
>
> > 2. Unify the btree to single Btree instead of 2 and load at driver side.
> > So that only one lookup can be done to find the blocklets directly. And
> > executors are not required to load the btree for every query.
> >
> > If we unify the btree to single btree, then it will be loaded in driver
> > side, so driver will need more memory to maintain this metadata and for
> > each query driver needs to do blocklet level pruning, in case of
> concurrent
> > query driver may be overloaded. And how we will send this metadata
> > information (offset of each column in file and other metadata) to
> executor,
> > serializing and deserializing this metadata cost also we need to
> consider.
> >
>
> Assuming there is only 4~8 blocklet in block, I think for a block, we can
> sent blockletId using a byte array. Should be ok, right?
>
> > Please correct me if my comments are not valid.
> >
> > -Regards
> > Kumar Vishal
> >
> >
> > On Thu, May 18, 2017 at 12:33 PM, Ravindra Pesala  >
> > wrote:
> >
> >> Hi Liang,
> >>
> >> Yes Liang , it will be done in 2 parts. At first reduce the size of the
> >> btree and then merge the driver side and executor btree to single btree.
> >>
> >> Regards,
> >> Ravindra.
> >>
> >> On 17 May 2017 at 19:28, Liang Chen  wrote:
> >>
> >>> Hi Ravi
> >>>
> >>> Thank you bringing this improvement discussion to mailing list.
> >>>
> >>> One question , the point1 how to solve the below issues ? there are
> still
> >>> two part index info in driver and executor side ?
> >>> 
> >>> 
> >>> And also chances of loading btree on each executor is more for every
> >> query
> >>> because there is no guarantee that same block goes to same executor
> every
> >>> time. It will be worse in case of dynamic containers.
> >>>
> >>> Regards
> >>> Liang
> >>>
> >>> 2017-05-17 7:33 GMT-04:00 Ravindra Pesala :
> >>>
>  Hi,
> 
>  *1. Current problem.*
>  1.There is more size taking on java heap to create Btree for index
> >> file.
>  It is because we create multiple objects for each leaf node so it
> takes
>  more memory inside heap than actual size of index file. while doing
> LRU
>  cache also we are considering only index file size instead of objects
> >>> size
>  so it impacts the eviction process of LRU cache.
>  2. Currently we load one btree on driver side to find the blocks and
> >>> load
>  another btree on executor side to find the blocklets. After we have
>  increased the blocklet size to 128 mb and decrease the table_block
> size
> >>> to
>  256 mb the number of nodes inside driver side btree and executor side
> >>> btree
>  is not much different. So it would be overhead to read the same
> >>> information
>  twice.
>  And also chances of loading btree on each executor is more for every
> >>> query
>  because there is no guarantee that same block goes to same executor
> >> every
>  time. It will be worse in case of dynamic containers.
> 
>  *2. Proposed solution.*
>  1. To reduce the java heap for Btree , we can remove the Btree data

Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Bhavya Aggarwal
Congrats Ravindra,

Regards
Bhavya

On Fri, May 19, 2017 at 4:56 PM, Liang Chen  wrote:

> Hi all
>
> We are pleased to announce that the PMC has invited Ravindra as new Apache
> CarbonData PMC member, and the invite has been accepted !
>
> Congrats to Ravindra and welcome aboard.
>
> Thanks
> The Apache CarbonData team
>


[ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Liang Chen
Hi all

We are pleased to announce that the PMC has invited Ravindra as new Apache
CarbonData PMC member, and the invite has been accepted !

Congrats to Ravindra and welcome aboard.

Thanks
The Apache CarbonData team


[jira] [Created] (CARBONDATA-1071) test cases of TestSortColumns class will never fails

2017-05-19 Thread SWATI RAO (JIRA)
SWATI RAO created CARBONDATA-1071:
-

 Summary: test cases of TestSortColumns class will never fails 
 Key: CARBONDATA-1071
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1071
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 1.1.0
 Environment: test
Reporter: SWATI RAO
 Fix For: 1.1.0


 test("create table with direct-dictioanry sort_columns") {

sql("CREATE TABLE sorttable3 (empno int, empname String, designation 
String, doj Timestamp, workgroupcategory int, workgroupcategoryname String, 
deptno int, deptname String, projectcode int, projectjoindate Timestamp, 
projectenddate Timestamp,attendance int,utilization int,salary int) STORED BY 
'org.apache.carbondata.format' ")

sql(s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE 
sorttable3 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '\"')""")
sql("select doj from sorttable3").show()
sql("select doj from sorttable3 order by doj").show()
checkAnswer(sql("select doj from sorttable3"), sql("select doj from 
sorttable3 order by doj"))
  }

result:
++
| doj|
++
|2010-12-29 00:00:...|
|2007-01-17 00:00:...|
|2011-11-09 00:00:...|
|2015-12-01 00:00:...|
|2013-09-22 00:00:...|
|2008-05-29 00:00:...|
|2009-07-07 00:00:...|
|2012-10-14 00:00:...|
|2015-05-12 00:00:...|
|2014-08-15 00:00:...|
++

| doj|
++
|2007-01-17 00:00:...|
|2008-05-29 00:00:...|
|2009-07-07 00:00:...|
|2010-12-29 00:00:...|
|2011-11-09 00:00:...|
|2012-10-14 00:00:...|
|2013-09-22 00:00:...|
|2014-08-15 00:00:...|
|2015-05-12 00:00:...|
|2015-12-01 00:00:...|
++

result of test case it passed ,but it should fail 

checkAnswer(sql("select doj from sorttable3"), sql("select doj from sorttable3 
order by doj")

this check is only validating the data not the order of data the real purpose 
for which sort column is used

to make sure we are able to verify the functionality of sort columns it test 
cases must be modified



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-1070) Not In Filter Expression throwing NullPointer Exception

2017-05-19 Thread sounak chakraborty (JIRA)
sounak chakraborty created CARBONDATA-1070:
--

 Summary: Not In Filter Expression throwing NullPointer Exception
 Key: CARBONDATA-1070
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1070
 Project: CarbonData
  Issue Type: Bug
  Components: core
Affects Versions: 1.2.0
Reporter: sounak chakraborty
Assignee: sounak chakraborty


Not In Filter Expression throwing NullPointer Exception



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: carbondata python API

2017-05-19 Thread Ravindra Pesala
Hi,

We did not implement/test for Python API on Carbon. But It may work if we
use datasource API and dataframes.

Regards,
Ravindra

On 19 May 2017 at 12:29, 风云际会 <1141982...@qq.com> wrote:

> carbondata have python API




-- 
Thanks & Regards,
Ravi


[GitHub] carbondata-site pull request #41: Fixed the font issue in the subscribe, un-...

2017-05-19 Thread PallaviSingh1992
GitHub user PallaviSingh1992 opened a pull request:

https://github.com/apache/carbondata-site/pull/41

Fixed the font issue in the subscribe, un-subscribe and archive link



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PallaviSingh1992/incubator-carbondata-site 
feature/UpdateFonts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata-site/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #41


commit dad85602a66b7f3260c0ccb4afa030a2e22da84a
Author: PallaviSingh1992 
Date:   2017-05-19T08:11:36Z

Fixed the font issue in the subscribe, un-subscribe and archive link




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


carbondata python API

2017-05-19 Thread ????????
carbondata have python API