Re: [VOTE] Apache CarbonData 2.3.1-rc1 release

2023-11-21 Thread Kunal Kapoor
+1 for the release

Thanks
Kunal Kapoor

On Tue, 21 Nov, 2023, 1:07 am Liang Chen,  wrote:

> Hi
>
>
>
> I submit the Apache CarbonData 2.3.1 (RC1) to your vote.
>
> 1. Key features of this release are highlighted as below..
>
>  - Fixed performance issues by changing index id.
>
>  - Fixed Secondary Index till segment level with SI as
> datamap, make Secondary Index as a coarse grain Datamap and use SI for
> Presto queries
>
>  - Fixed Exception in loading data with overwrite on
> partition table.
>
>  - Fixed spark integration compile issues.
>
>  - Fixed Exception in loading data with overwrite on
> partition table.
>
>  - Fixed index issues of "sort_columns"
>
>  - Fixed index issues about multiple sessions table.
>
>  - Fixed Exception in loading data with overwrite on
> partition table.
>
>  - Fixed DDM sentence about column query failed.
>
>
>
>  2. The tag to be voted upon : apache-carbondata-2.3.1-rc1 :
>
> https://github.com/apache/carbondata/commit/d326118db86414a7895a76dafebb881ba1e52c1c
>
>
> 3.The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/carbondata/2.3.1-rc1/
>
>
> 4. A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1075/
>
>
> 5. Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/chenliang613.asc
>
> Please vote on releasing this package as Apache CarbonData 2.3.1,  The vote
> will beopen for the next 72 hours and passes if a majority of
>
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache CarbonData 2.3.1
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>


Re: Dear community

2023-10-19 Thread Kunal Kapoor
Hey Liang and Xu Bo,
AI seems to be a good direction to move forward.

Ray is also a good option to integrate Carbondata with. It is getting quite
popular and has a strong place in the ML stack.

I suggest upgrading to newer spark versions as they have many good features
for AI/ML.
Also we should upgrade the spark version frequently to leverage these
features.

Another idea that popped into my head is if Carbondata can help in an
offline Feature Store for the model training.
Not sure whether this is feasible or even the right approach, need to
brainstorm on this.

Thanks
Kunal Kapoor


On Thu, 19 Oct, 2023, 7:36 pm Bo Xu,  wrote:

> Agree, CarbonData was focus on bigdata before, and only has very less
> integration with AI, such as PyCarbon, which support PyTorch and TensorFlow
> read data from CarbonData.  AI is very popular recently, and has many
> customer need unified data format and storage for bigdata and AI.
> I suggest:
> 1. support developer tools integrade CarbonData, such as jupyter notebook
> and  zepplin,
> 2. improve usability of CarbonData, such as support run CarbonData on
> docker and kubernetes  easily
> 3.support/enhance different AI framework  integrate CarbonData, such as
> TensorFlow/PyTorch/Ray
>
> I hope CarbonData can become unified data format and datastore for
> bigdata,warehouse and AI, User can use the same data with CarbonData in
> different compute engine,such as spark/flink/tensorflow/Pytorch
>
>
> On 2023/10/19 07:58:14 Liang Chen wrote:
> > As you know, Carbondata as datastore and dataformat already be quite good
> > and mature.
> > I want to create the thread via mailing list to open discuss what are the
> > next milestones of carbondata project?
> > One proposal from my side: we should consider how to integrate with AI
> > computing engine?
> >
> > Regards
> > Liang
> >
>


Re: [ANNOUNCE] Bo Xu as new PMC for Apache CarbonData

2023-04-26 Thread Kunal Kapoor
Congratulations Xubo

Regards
Kunal Kapoor

On Tue, Apr 25, 2023 at 10:53 PM Akash r  wrote:

> Congratulations Xubo
>
> Regards
> Akash R Nilugal
>
> On Tue, 25 Apr 2023 at 12:28 AM, Liang Chen 
> wrote:
>
> > *Hi *
> >
> >
> > *We are pleased to announce that Bo Xu as new PMC for Apache CarbonData.*
> >
> >
> > *Congrats to **Bo Xu**!*
> >
> >
> > *Apache CarbonData PMC*
> >
>


Re: [DISCUSSION] Incremental‌ ‌Dataload‌‌of Average aggregate in ‌MV‌‌

2022-03-22 Thread Kunal Kapoor
Hi Shreelekhya,
+1 for this feature,

I could not understand how you would be able to handle compatibility, for
the old tables the SQL doesn't have to be rewritten, while for new tables
it has to be. So how will you decide this?
Same scenario for old table load

Also, Need to benchmark the impact on query performance for a large number
of segments

On Tue, Mar 22, 2022 at 10:28 AM Indhumathi 
wrote:

> Hello Shreelekhya,
>
> Please find the comments inline in the design document link shared.
>
> Regards,
> Indhumathi M
>
> On 2022/03/17 16:09:10 Shreelekhya Gampa wrote:
> > Hi everyone,
> >
> > Currently, MV in carbon is supported as incremental or full refresh load
> > based on the type of query. Whenever MV is created with Average
> aggregate,
> > a full refresh is done meaning it reloads the whole MV for any newly
> added
> > segments. This will slow down the loading. With an incremental data load
> of
> > Average, only the segments that are newly added can be loaded to the MV.
> > With the Average function, incremental loading doesn't work directly as
> the
> > average of avg(col) in MV won't give correct results. To achieve
> > proper results we need the sum and count of the columns.
> >
> > Following is the link to the design document. Please let me know
> > your thoughts about the same.
> >
> https://docs.google.com/document/d/1kPEMCX50FLZcmyzm6kcIQtUH9KXWDIqh-Hco7NkTp80/edit?usp=sharing
> >
> >
> > Thanks,
> > Shreelekhya Gampa
> >
>


Re: [ANNOUNCE] Vikram Ahuja as new Apache CarbonData committer

2022-02-13 Thread Kunal Kapoor
Congratulations vikram

On Mon, 14 Feb 2022, 10:36 am Vikram Ahuja, 
wrote:

> Thank you all for your wishes
>
> Regards
>  Vikram Ahuja
>


Re: [VOTE] Apache CarbonData 2.3.0(RC2) release

2022-01-23 Thread Kunal Kapoor
Hi all

PMC vote has passed for Apache Carbondata 2.3.0 release, the result as
below:

+1(binding): 5(Akash, Kumar Vishal, Ravindra, Jean-Baptiste)

+1(non-binding) : 1

Thanks all for your vote.
Kunal Kapoor

On Thu, Jan 20, 2022 at 11:09 PM Indhumathi M 
wrote:

> +1
>
> Regards,
> Indhumathi M
>
> On Wed, 19 Jan 2022 at 9:27 PM, Kunal Kapoor 
> wrote:
>
> > Hi All,
> > I submit the Apache CarbonData 2.3.0(RC2) for your vote.
> >
> >
> > *1.Release Notes:*
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12349262&styleName=Html&projectId=12320220&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_e7564140ee4c259084ecff7746af846d0c968ea9_lin
> >
> > *Some key features and improvements in this release:*
> >
> >- Support spatial index creation using data frame
> >- Upgrade prestosql to 333 version
> >- Support Carbondata Streamer tool to fetch data incrementally and
> merge
> >- Support DPP for carbon filters
> >- Alter support for complex types
> >
> >  *2. The tag to be voted upon* : apache-carbondata-2.3.0-rc2
> > <https://github.com/apache/carbondata/tree/apache-carbondata-2.3.0-rc2>
> >
> > Commit: 6db604a6389673194b30e3c45e7252af6400d54b
> > <
> >
> https://github.com/apache/carbondata/commit/6db604a6389673194b30e3c45e7252af6400d54b
> > >
> >
> > *3. The artifacts to be voted on are located here:*
> > https://dist.apache.org/repos/dist/dev/carbondata/2.3.0-rc2/
> >
> > *4. A staged Maven repository is available for review at:*
> >
> https://repository.apache.org/content/repositories/orgapachecarbondata-1074
> >
> > *5. Release artifacts are signed with the following key:*
> > https://people.apache.org/keys/committer/kunalkapoor.asc
> >
> >
> > Please vote on releasing this package as Apache CarbonData 2.3.0,  The
> > vote will
> > be open for the next 72 hours and passes if a majority of at least three
> +1
> > PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache CarbonData 2.3.0
> > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> >
> > Regards,
> > Kunal Kapoor
> >
>


[VOTE] Apache CarbonData 2.3.0(RC2) release

2022-01-19 Thread Kunal Kapoor
Hi All,
I submit the Apache CarbonData 2.3.0(RC2) for your vote.


*1.Release Notes:*
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12349262&styleName=Html&projectId=12320220&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_e7564140ee4c259084ecff7746af846d0c968ea9_lin

*Some key features and improvements in this release:*

   - Support spatial index creation using data frame
   - Upgrade prestosql to 333 version
   - Support Carbondata Streamer tool to fetch data incrementally and merge
   - Support DPP for carbon filters
   - Alter support for complex types

 *2. The tag to be voted upon* : apache-carbondata-2.3.0-rc2
<https://github.com/apache/carbondata/tree/apache-carbondata-2.3.0-rc2>

Commit: 6db604a6389673194b30e3c45e7252af6400d54b
<https://github.com/apache/carbondata/commit/6db604a6389673194b30e3c45e7252af6400d54b>

*3. The artifacts to be voted on are located here:*
https://dist.apache.org/repos/dist/dev/carbondata/2.3.0-rc2/

*4. A staged Maven repository is available for review at:*
https://repository.apache.org/content/repositories/orgapachecarbondata-1074

*5. Release artifacts are signed with the following key:*
https://people.apache.org/keys/committer/kunalkapoor.asc


Please vote on releasing this package as Apache CarbonData 2.3.0,  The
vote will
be open for the next 72 hours and passes if a majority of at least three +1
PMC votes are cast.

[ ] +1 Release this package as Apache CarbonData 2.3.0
[ ] 0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...


Regards,
Kunal Kapoor


[VOTE] Apache CarbonData 2.3.0(RC1) release

2021-12-20 Thread Kunal Kapoor
Hi All,

I submit the Apache CarbonData 2.3.0(RC1) for your vote.


*1.Release Notes:*
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12349262&styleName=Html&projectId=12320220&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_e7564140ee4c259084ecff7746af846d0c968ea9_lin

*Some key features and improvements in this release:*

   - Support spatial index creation using data frame
   - Upgrade prestosql to 333 version
   - Support Carbondata Streamer tool to fetch data incrementally and merge
   - Support DPP for carbon filters
   - Alter support for complex types

 *2. The tag to be voted upon* : apache-carbondata-2.3.0-rc1
<https://github.com/apache/carbondata/tree/apache-carbondata-2.3.0-rc1>

Commit: 70065894d02ce2e898b1ed3cd7b0b10f6305db44
<https://github.com/apache/carbondata/commit/70065894d02ce2e898b1ed3cd7b0b10f6305db44>

*3. The artifacts to be voted on are located here:*
https://dist.apache.org/repos/dist/dev/carbondata/2.3.0-rc1/

*4. A staged Maven repository is available for review at:*
https://repository.apache.org/content/repositories/orgapachecarbondata-1072

*5. Release artifacts are signed with the following key:*
https://people.apache.org/keys/committer/kunalkapoor.asc


Please vote on releasing this package as Apache CarbonData 2.3.0,  The
vote will
be open for the next 72 hours and passes if a majority of at least three +1
PMC votes are cast.

[ ] +1 Release this package as Apache CarbonData 2.3.0

[ ] 0 I don't feel strongly about it, but I'm okay with the release

[ ] -1 Do not release this package because...


Regards,
Kunal Kapoor


Re: [VOTE] Apache CarbonData 2.2.0(RC2) release

2021-08-02 Thread Kunal Kapoor
+1

Regards
Kunal Kapoor

On Mon, 2 Aug 2021, 4:53 pm Kumar Vishal,  wrote:

> +1
> Regards
> Kumar Vishal
>
> On Mon, 2 Aug 2021 at 2:28 PM, Indhumathi M 
> wrote:
>
> > +1
> >
> > Regards,
> > Indhumathi M
> >
> > On Mon, Aug 2, 2021 at 12:33 PM Akash Nilugal 
> > wrote:
> >
> > > Hi All,
> > >
> > > I submit the Apache CarbonData 2.2.0(RC2) for your vote.
> > >
> > >
> > > *1.Release Notes:*
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12347869&styleName=Html&projectId=12320220&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_d44fca7058ab2c2a2a4a24e02264cc701f7d10b8_lin
> > >
> > >
> > > *Some key features and improvements in this release:*
> > >- Integrate with Apache Spark-3.1
> > >- Leverage Secondary Index till segment level with SI as datamap and
> > SI
> > > with plan rewrite
> > >- Make Secondary Index as a coarse grain datamap and use secondary
> > > indexes for Presto queries
> > >- Support rename SI table
> > >- Support describe column
> > >- Local sort Partition Load and Compaction improvement
> > >- GeoSpatial Query Enhancements
> > >- Improve the table status and segment file writing
> > >- Improve the carbon CDC performance and introduce APIs to UPSERT,
> > > DELETE, UPDATE and DELETE
> > >- Improvements clean file and rename performance
> > >
> > > *2. The tag to be voted upon:* apache-carbondata-2.2.0-rc2
> > > https://github.com/apache/carbondata/tree/apache-carbondata-2.2.0-rc2
> > >
> > > Commit: c3a908b51b2f590eb76eb4f4d875cd568dbece40
> > >
> > >
> >
> https://github.com/apache/carbondata/commit/c3a908b51b2f590eb76eb4f4d875cd568dbece40
> > >
> > >
> > > *3. The artifacts to be voted on are located here:*
> > > https://dist.apache.org/repos/dist/dev/carbondata/2.2.0-rc2
> > >
> > > *4. A staged Maven repository is available for review at:*
> > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachecarbondata-1071/
> > >
> > >
> > > Please vote on releasing this package as Apache CarbonData 2.2.0,  The
> > vote
> > > will be open for the next 72 hours and passes if a majority of at least
> > > three +1
> > > PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache CarbonData 2.2.0
> > >
> > > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> > >
> > > [ ] -1 Do not release this package because...
> > >
> > >
> > > Regards,
> > > Akash R Nilugal
> > >
> >
>


Re: [VOTE] Apache CarbonData 2.2.0(RC1) release

2021-07-06 Thread Kunal Kapoor
-1,
Many important issues are still open.
Lets wait for these fixes

Thanks
Kunal Kapoor

On Tue, 6 Jul 2021, 3:44 pm Kumar Vishal,  wrote:

> -1
> Pls consider pr 4148
> Regards
> Kumar Vishal
>
> On Tue, 6 Jul 2021 at 12:45 PM, Akash Nilugal 
> wrote:
>
> > Hi All,
> >
> > I submit the *Apache CarbonData 2.2.0(RC1) *for your vote.
> >
> >
> >
> > *1. Release Notes:*
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12347869&styleName=Html&projectId=12320220&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_386c7cf69a9d53cc8715137e7dba91958dabef9b_lin
> >
> > *Some key features and improvements in this release:*
> >
> >- Integrate Carbondata with spark-3.1
> >- Leverage Secondary Index till segment level with SI as datamap and
> SI
> > with plan rewrite
> >- Make Secondary Index as a coarse grain datamap and use secondary
> > indexes for Presto queries
> >- Support rename SI table
> >- Local sort Partition Load and Compaction improvement
> >- GeoSpatial Query Enhancements
> >- Improve the table status and segment file writing
> >
> > *2. The tag to be voted upon*: apache-carbondata-2.2.0-rc1
> > <https://github.com/apache/carbondata/tree/apache-carbondata-2.2.0-rc1>
> >
> > *Commit: *d4e5d2337164b34fa19a42a40c03da26ff65ab9e
> > <
> >
> >
> https://github.com/apache/carbondata/commit/d4e5d2337164b34fa19a42a40c03da26ff65ab9e
> > >
> >
> >
> > *3. The artifacts to be voted on are located here:*
> > https://dist.apache.org/repos/dist/dev/carbondata/2.2.0-rc1/
> >
> > *4. A staged Maven repository is available for review at:*
> >
> >
> https://repository.apache.org/content/repositories/orgapachecarbondata-1070/
> >
> >
> > Please vote on releasing this package as Apache CarbonData 2.2.0,  The
> > vote will
> > be open for the next 72 hours and passes if a majority of at least three
> +1
> > PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache CarbonData 2.2.0
> >
> > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> >
> > [ ] -1 Do not release this package because...
> >
> >
> > Regards,
> > Akash R Nilugal
> >
>


Re: [ANNOUNCE] Akash R Nilugal as new PMC for Apache CarbonData

2021-04-13 Thread Kunal Kapoor
Congratulations akash

On Tue, 13 Apr 2021, 8:43 pm Akash r,  wrote:

> Thank you all
>
>
> Regards,
> Akash R Nilugal
>
> On Sun, Apr 11, 2021, 6:29 PM Liang Chen  wrote:
>
> > Hi
> >
> >
> > We are pleased to announce that Akash R Nilugal as new PMC for Apache
> > CarbonData.
> >
> >
> > Congrats to Akash R Nilugal!
> >
> >
> > Apache CarbonData PMC
> >
>


Re: [DISCUSSION] Support alter schema for complex types

2021-03-29 Thread Kunal Kapoor
+1

On Fri, Mar 26, 2021 at 6:19 PM akshay_nuthala 
wrote:

> No, these and other nested level operations will be taken care in the next
> phase.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Support JOIN query with spatial index

2021-03-29 Thread Kunal Kapoor
+1

On Mon, Mar 22, 2021 at 4:07 PM Indhumathi  wrote:

> Hi community,
>
> Currently, carbon supports IN_POLYGON and IN_POLYGON_LIST udf's,
> where user has to manually provide the polygon points(series of latitude
> and longitude pair), to query carbon table based on spatial index.
>
> This feature will support JOIN tables based on IN_POLYGON udf
> filter, where polygon data exists in a table.
>
> Please find below link of design doc. Please check and give
> your inputs/suggestions.
>
>
> https://docs.google.com/document/d/11PnotaAiEJQK_QvKsHznDy1I9tO4idflW32LstwcLhc/edit#heading=h.yh6qp815dh3p
>
>
> Thanks & Regards,
> Indhumathi M
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: Improve carbondata CDC performance

2021-03-29 Thread Kunal Kapoor
+1, agree with ravi's suggestion

On Thu, Mar 11, 2021 at 7:53 PM Ravindra Pesala 
wrote:

> +1
> Instead of doing the cartesian join, we can broadcast the sorted min/max
> with file paths and do the binary search inside the map function.
>
> Thank you
>
> On Wed, 24 Feb 2021 at 13:02, akashrn5  wrote:
>
> > Hi Venu,
> >
> > Thanks for your review.
> >
> > I have replied the same in the document.
> > you are right
> >
> > 1. its taken care to group by extended blocklets on split path and get
> the
> > min-max on block level
> > 2. we need to do group by on the file path to avoid the duplicates from
> > dataframe output. I have updated the same in the doc please have a look.
> >
> > Thanks,
> > Akash R
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>
>
> --
> Thanks & Regards,
> Ravi
>


Re: [Discussion]Presto Queries leveraging Secondary Index

2021-03-29 Thread Kunal Kapoor
+1 for the design

On Tue, Mar 23, 2021 at 10:37 AM VenuReddy 
wrote:

> Hi all.!
>
> As discussed in the community meeting held on last week of Feb 2021, we
> already have plan to make Secondary Index as a Coarse Grain Datamap in the
> future. And It would be more appropriate for this requirement to  implement
> Secondary Index as the CG Datamap. Presto query can leverage secondary
> index
> in the pruning through the datamap interface. Spark queries can still
> continue to make use of secondary indexes with existing approach of query
> plan modification.
>
> Have added the detailed design in the below doc.
>
>
> https://docs.google.com/document/d/1VZlRYqydjzBXmZcFLQ4Ty-lK8RQlYVDoEfIId7vOaxk/edit?usp=sharing
>
> Please review it and let me know your suggestions/inputs.
>
> Thanks,
> Venu Reddy
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [VOTE] Apache CarbonData 2.1.1(RC2) release

2021-03-26 Thread Kunal Kapoor
+1


On Fri, Mar 26, 2021 at 2:57 PM Ajantha Bhat  wrote:

> Hi All,
>
> I submit the Apache CarbonData 2.1.1(RC2) for your vote.
>
> *1.Release Notes:*
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12349409&styleName=Html&projectId=12320220&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_bb629ffd13f06db9dafa005fe7b737939b88ba5d_lin
>
> *Some key features and improvements in this release:*
>
>- Geospatial index algorithm improvement and UDFs enhancement
>- Adding global sort support for SI segments data files merge operation.
>- Refactor CarbonDataSourceScan without Spark Filter
>- Size control of minor compaction
>- Clean files become data trash manager
>- Fix error when loading string field with high cardinality (local
> dictionary fallback issue)
>
>
>  *2. The tag to be voted upon* : apache-carbondata-2.1.1-rc2
> 
>
> Commit:
> 770ea3967c81abcd61c28c4d9bb557da9ceb4322
> <
>
> https://github.com/apache/carbondata/commit/770ea3967c81abcd61c28c4d9bb557da9ceb4322
> >
>
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.1.1-rc2/
>
> *4. A staged Maven repository is available for review at:*
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1068/
>
> *5. Release artifacts are signed with the following key:*
> https://people.apache.org/keys/committer/ajantha.asc
>
>
> Please vote on releasing this package as Apache CarbonData 2.1.1,
> The vote will be open for the next 72 hours and passes if a majority of at
> least
> three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache CarbonData 2.1.1
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
> Regards,
> Ajantha Bhat
>


Re: [DISCUSSION] Describe complex columns

2021-03-22 Thread Kunal Kapoor
+1 good idea for visualizing complex columns

On Thu, Mar 18, 2021 at 10:35 PM Shreelekhya 
wrote:

> Hi,
>
> Currently describe formatted displays the column information of a table and
> some additional information. When complex types such as  ARRAY, STRUCT, and
> MAP types are present in table, column definition can be long and it’s
> difficult to read in a nested format.
>
> For complex types available, the DESCRIBE output can be formatted to avoid
> long lines for multiple fields. We can pass the complex field name to the
> command and visualize its structure as if were a table.
>
> DDL Commands:
> DESCRIBE fieldname ON [db_name.]table_name;
> DESCRIBE short [db_name.]table_name;
>
> Please let me know your valid inputs about the same.
> Following is the link to the design document.
>
>
> https://docs.google.com/document/d/1uNCByKR09Up9S2hpiEXYDA8XXIEZ2jI9PMSzMIVJ0Js/edit
>
>
> Thanks,
> Shreelekhya
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: DISCUSSION: propose to activate "Issues" of https://github.com/apache/carbondata

2021-03-18 Thread Kunal Kapoor
Then i don't think moving to github issues would be good. JIRA is much
better suited for our needs


On Thu, Mar 18, 2021 at 8:48 PM Jean-Baptiste Onofre 
wrote:

> Hi,
>
> I don’t think so. GH Issues is not as "advanced" as Jira.
>
> Regards
> JB
>
> > Le 18 mars 2021 à 16:17, Kunal Kapoor  a
> écrit :
> >
> > Hi,
> > Is there a way to create story and subtask in github issues similar to
> jira?
> >
> >
> > On Thu, 18 Mar 2021, 6:12 pm Jean-Baptiste Onofre, 
> wrote:
> >
> >> Hi,
> >>
> >> First of all, I full agree with Liang.
> >>
> >> I think it makes sense to keep the workflow:
> >>
> >> - issue
> >> - PullRequest
> >> - checks (CI)
> >> - merge
> >>
> >> I think that we can already use Github issue (and even Github actions if
> >> we want).
> >>
> >> Regards
> >> JB
> >>
> >>> Le 18 mars 2021 à 12:59, Liang Chen  a écrit
> :
> >>>
> >>> Hi
> >>>
> >>> As you know,  for better managing community, i propose to put "Issues,
> >> Pull
> >>> Request, Code" together and request Apache INFRA to activate "Issues"
> of
> >>> github.
> >>>
> >>> Open discussion, please input your comments.
> >>>
> >>> Regards
> >>> Liang
> >>
> >>
>
>


Re: DISCUSSION: propose to activate "Issues" of https://github.com/apache/carbondata

2021-03-18 Thread Kunal Kapoor
Hi,
Is there a way to create story and subtask in github issues similar to jira?


On Thu, 18 Mar 2021, 6:12 pm Jean-Baptiste Onofre,  wrote:

> Hi,
>
> First of all, I full agree with Liang.
>
> I think it makes sense to keep the workflow:
>
> - issue
> - PullRequest
> - checks (CI)
> - merge
>
> I think that we can already use Github issue (and even Github actions if
> we want).
>
> Regards
> JB
>
> > Le 18 mars 2021 à 12:59, Liang Chen  a écrit :
> >
> > Hi
> >
> > As you know,  for better managing community, i propose to put "Issues,
> Pull
> > Request, Code" together and request Apache INFRA to activate "Issues" of
> > github.
> >
> > Open discussion, please input your comments.
> >
> > Regards
> > Liang
>
>


Re: [VOTE] Apache CarbonData 2.1.1(RC1) release

2021-03-18 Thread Kunal Kapoor
-1, lets wait for the pending jira's to get resolved

On Thu, 18 Mar 2021, 7:30 pm Kumar Vishal, 
wrote:

> -1
> -Regards
> Kumar Vishal
>
> On Thu, 18 Mar 2021 at 6:35 PM, David CaiQiang 
> wrote:
>
> > -1, please fix the pending defect and merge the completed PR at first.
> >
> >
> >
> > -
> > Best Regards
> > David Cai
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>


Re: [Discussion]Presto Queries leveraging Secondary Index

2021-02-24 Thread Kunal Kapoor
+1 on using index server to leverage SI index. As discussed earlier we
would need a segment UDF to enable selective segment reading instead of the
current implementation. The existing setSegmentsToRead API should be
removed later as well

Please share the design after your POC

On Mon, Jan 18, 2021 at 9:42 AM akashrn5  wrote:

> Hi venu,
>
> Thanks for suggesting.
>
> 1. option 1 is not a good idea. i think performance will be bad
> 2. for option2, like we have other indexes of lucene and bloom where the
> distributed pruning happens. Lucene also a index stored along with table,
> but not another table like SI, so we scan lucene in a distributed job and
> then return the index for the filter expression. So similarly we can call
> for SI to scan and prune, but since we need spark job to do it, we need
> indexserver which is the only option.
> So we can use that for scanning, but im afraid if it impacts the other
> concurrent queries, so i would suggest better to go for POC with the index
> server where we will get to know some other bottlenecks with this approach,
> so then we can decide and start design.
>
> If you have already done POC and have some results and design is ready, we
> can review that.
>
> Thanks
>
> Regards
> Akash
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Display the segment ID when carbondata load is successful

2021-02-24 Thread Kunal Kapoor
+1, showing segment id should be enough as other information can be
gathered by other means.

On Thu, Feb 18, 2021 at 1:31 PM Yahui Liu  wrote:

> Hi,
>
> I think after load, only return the segment id which data is loaded to is
> enough no matter auto load merge is enable or not. I will add one more
> reason apart from @areyouokfreejoe metioned:
> 1. Because user alredy cares about each load, so mostly in their
> application
> logic, auto load merge is disabled, user will hanlde compaction by
> themselves. Auto load merge only base on segment no., not base on any
> business relation between the segments. So if they enable auto load merge,
> several segments which has no any relation just the segment_id is close
> will
> be compacted. After this kind of compaction, all the information in the
> segment before compaction will be lost, this is not what user wants. If any
> load is special, in order to not lost any information after compaction,
> this
> load should only merge with the segment which has the same special point
> which is only known by the application, carbon currently has no place to
> store this information. So only user can control which segments will be
> compacted by trigger custom compaction with the segment ids which those
> segments have the same special point.
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: About: Carbon Thrift Server is always hung dead!

2021-02-04 Thread Kunal Kapoor
Yes, because we are using the existing  JDBC process provided by spark,
therefore HA should be supported by following the link in my previous reply.

You can try that solution and if you face any problem then we can discuss.

On Thu, 4 Feb 2021, 1:59 pm jingych,  wrote:

> Hi Kunal Kapoor,
>
> Thanks for your reply.
>
> I've switched the carbon thrift server to the Option 1.
> And I'll track the new solution for one or two days, then reply if it's ok.
>
> But I still have a question about the HA solution:
> We are using jdbc to connect to the carbon table.
> So I want to know does the new thrift server solution support the HA?
>
> Thanks!
> Jingych.
>
> -邮件原件-
> 发件人: Kunal Kapoor [mailto:kunalkapoor...@gmail.com]
> 发送时间: 2021年2月4日 13:15
> 收件人: dev@carbondata.apache.org
> 主题: Re: About: Carbon Thrift Server is always hung dead!
>
> Hi jingych,
>
> 1. Use of CarbonThriftServer has been deprecated by the community since
> 2.0 release. Please use the "spark.sql.extensions" property to configure
> and use carbondata as mentioned here <
> https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md#option-1-starting-thrift-server-with-carbonextensionssince-20
> >
> (Option
> 1).
> 2. HA for carbondata can be achieved by using the existing spark HA
> implementation(
> http://spark.apache.org/docs/latest/spark-standalone.html#high-availability
> ).
>
> Please try the above mentioned solutions and tell us whether the problem
> is solved or not. We can check further on this if some issue persists.
>
> You can join slack for better communication with us using this link <
> https://join.slack.com/t/carbondataworkspace/shared_invite/zt-g8sv1g92-pr3GTvjrW5H9DVvNl6H2dg
> >
> .
>
> Thank you
> Kunal Kapoor
>
>
> On Thu, Feb 4, 2021 at 6:45 AM jingych  wrote:
>
> > Hello, all!
> >
> > Thanks for the carbondata community, it's really so fast!
> >
> > But recently I was confused about the carbon thrift server. It's
> > always hang up and dead.
> >
> > So I do need your help, please!
> >
> > My environment is:
> > 6nodes: Carbon 2.0 + spark2.4.5 + hadoop2.10, Each node: 16cores +
> > 64GB mem + 1TB disk
> >
> > And here is my thrift server shell:
> > spark-submit
> > --master yarn
> > --num-executors 4
> > --driver-memory 10G
> > --executor-memory 10G
> > --executor-cores 4
> > --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
> >
> > ../carbonlib/apache-carbondata-2.1.0-SNAPSHOT-bin-spark2.4.5-hadoop2.7
> > .2.jar
> >
> > So what's the problem? And is there a HA solution for the thrift server?
> >
> > Thanks!
> >
> > Best regards!
> >
> > 
> >  Jingych
> > 2021-02-04
> >
> >
> > --
> > - Confidentiality Notice: The information
> > contained in this e-mail and any accompanying attachment(s) is
> > intended only for the use of the intended recipient and may be
> > confidential and/or privileged of Neusoft Corporation, its
> > subsidiaries and/or its affiliates. If any reader of this
> > communication is not the intended recipient,unauthorized
> > use,forwarding, printing, storing, disclosure or copying is strictly
> > prohibited, and may be unlawful.If you have received this
> > communication in error,please immediately notify the sender by return
> > e-mail, and delete the original message and all copies from your
> > system. Thank you.
> >
> > --
> > -
> >
>
>
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication
> is not the intended recipient,unauthorized use,forwarding, printing,
> storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
>
> ---
>


Re: About: Carbon Thrift Server is always hung dead!

2021-02-03 Thread Kunal Kapoor
Hi jingych,

1. Use of CarbonThriftServer has been deprecated by the community since 2.0
release. Please use the "spark.sql.extensions" property to configure and
use carbondata as mentioned here
<https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md#option-1-starting-thrift-server-with-carbonextensionssince-20>
(Option
1).
2. HA for carbondata can be achieved by using the existing spark HA
implementation(
http://spark.apache.org/docs/latest/spark-standalone.html#high-availability
).

Please try the above mentioned solutions and tell us whether the problem is
solved or not. We can check further on this if some issue persists.

You can join slack for better communication with us using this link
<https://join.slack.com/t/carbondataworkspace/shared_invite/zt-g8sv1g92-pr3GTvjrW5H9DVvNl6H2dg>
.

Thank you
Kunal Kapoor


On Thu, Feb 4, 2021 at 6:45 AM jingych  wrote:

> Hello, all!
>
> Thanks for the carbondata community, it's really so fast!
>
> But recently I was confused about the carbon thrift server. It's always
> hang up and dead.
>
> So I do need your help, please!
>
> My environment is:
> 6nodes: Carbon 2.0 + spark2.4.5 + hadoop2.10,
> Each node: 16cores + 64GB mem + 1TB disk
>
> And here is my thrift server shell:
> spark-submit
> --master yarn
> --num-executors 4
> --driver-memory 10G
> --executor-memory 10G
> --executor-cores 4
> --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
>
> ../carbonlib/apache-carbondata-2.1.0-SNAPSHOT-bin-spark2.4.5-hadoop2.7.2.jar
>
> So what's the problem? And is there a HA solution for the thrift server?
>
> Thanks!
>
> Best regards!
>
> 
>  Jingych
> 2021-02-04
>
>
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication
> is not the intended recipient,unauthorized use,forwarding, printing,
> storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
>
> ---
>


Re: [DISCUSSION]Merge index property and operations improvement.

2020-11-26 Thread Kunal Kapoor
+1

On Mon, Nov 23, 2020 at 3:12 PM akashrn5  wrote:

> Hi david,
>
> Thanks for reply
>
>  a) remove mergeIndex property and event listener, add mergeIndex as a part
> of loading/compaction transaction.
> ==> yes, this can be done, as already discussed.
>
> b) if the merging index failed, loading/compaction should fail directly.
> ==> Agree to this, same as replied to ajantha
>
> c) keep merge_index command and mark it deprecated.
>   for a new table, maybe it will do nothing.
>   for an old table, maybe we need to tolerate the probable query issue
> (not found index files).
>   It could be a deprecated feature in the future.
> ===> for new table we can follow similar to what i replied to ajantha, and
> old tables we still need
> and since we are avoiding to delete index files immediately for old tables,
> we can avoid issue i think.
> As you said, we can complete deprecate after sometime,
>
>
> b). At the end of loading,  retrying to finish index merging or
> tablestatus update is a good suggestion.
> ===> Do you mean, retry count for table status updation and merge index
> should be more?
>
> Regards,
> Akash
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: Size control of minot compaction

2020-11-25 Thread Kunal Kapoor
The user has to anyways change the application to add new property.
If we don't change the property name then at least we can use the existing
major compaction size threshold property instead of adding a new one.

On Tue, 24 Nov 2020, 1:43 pm Zhangshunyu,  wrote:

> Hi Akash, if we change the property name, the old user need to change many
> places like code of his application, cluster config file etc to adapt to
> this change.  What's your opinion? @David @Ajantha
>
>
>
> -
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: Size control of minot compaction

2020-11-23 Thread Kunal Kapoor
Hi Zhangshunyu,
We should refactor the code and change the property name from "
carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
global property is exposed which defines the size after which segment would
not be considered for auto compaction). By doing this we can use the same
threshold for major and minor compaction. Let us avoid adding new property
for a minor compaction size threshold.

Minor compaction would consider the segments based on  "
carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold".
Major would consider all segments with size below "
carbon.compaction.size.threshold".
Custom compaction should not consider any property and do a force
compaction(existing behaviour).

Thanks
Kunal Kapoor

On Tue, Nov 24, 2020 at 10:32 AM Kunal Kapoor 
wrote:

> Hi Zhangshunyu,
> We should refactor the code and change the property name from "
> carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
> global property is exposed which defines the size after which segment would
> not be considered for auto compaction). By doing this we can use the same
> threshold for major and minor compaction. Let us avoid adding new property
> for a minor compaction size threshold.
>
> Consider 5 segments when carbon.compaction.threshold = 1GB:
>
> Minor compaction would consider the segments based on  "
> carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold
> ".
> Major would consider all segments with size below "
> carbon.compaction.size.threshold".
> Custom compaction should not consider any property and do a force
> compaction(existing behaviour).
>
> Thanks
> Kunal Kapoor
>
> On Tue, Nov 24, 2020 at 7:32 AM Zhangshunyu 
> wrote:
>
>> OK
>>
>>
>>
>> -
>> My English name is Sunday
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>>
>


Re: Size control of minot compaction

2020-11-23 Thread Kunal Kapoor
Hi Zhangshunyu,
We should refactor the code and change the property name from "
carbon.major.compaction.size" to "carbon.compaction.size.threshold"( A
global property is exposed which defines the size after which segment would
not be considered for auto compaction). By doing this we can use the same
threshold for major and minor compaction. Let us avoid adding new property
for a minor compaction size threshold.

Consider 5 segments when carbon.compaction.threshold = 1GB:

Minor compaction would consider the segments based on  "
carbon.compaction.size.threshold  " and "carbon.compaction.level.threshold".
Major would consider all segments with size below "
carbon.compaction.size.threshold".
Custom compaction should not consider any property and do a force
compaction(existing behaviour).

Thanks
Kunal Kapoor

On Tue, Nov 24, 2020 at 7:32 AM Zhangshunyu  wrote:

> OK
>
>
>
> -
> My English name is Sunday
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [VOTE] Apache CarbonData 2.1.0(RC2) release

2020-11-08 Thread Kunal Kapoor
Hi all

PMC vote has passed for Apache Carbondata 2.1.0 release, the result as
below:

+1(binding): 5(Ravindra Pesala, David CaiQiang, Liang Chen, Zhichao Zhang,
Kumar Vishal)

+1(non-binding) : 3(Indhumathi, Akash, Ajantha Bhat)


Thanks all for your vote.

On Fri, Nov 6, 2020 at 8:29 AM Liang Chen  wrote:

> +1(binding)
>
> Regards
> Liang
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


[VOTE] Apache CarbonData 2.1.0(RC2) release

2020-11-03 Thread Kunal Kapoor
Hi All,

I submit the Apache CarbonData 2.1.0(RC2) for your vote.


*1.Release Notes:*
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12347868&styleName=Html&projectId=12320220&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_1b35cd2d01110783b464000df174503b2dc55b0c_lin

*Some key features and improvements in this release:*

   - Support Float and Decimal in the Merge Flow
   - Implement delete and update feature in carbondata SDK.
   - Support array with SI
   - Support IndexServer with Presto Engine
   - Insert from stage command support partition table.
   - Implementing a new Reindex command to repair the missing SI Segments
   - Support Change Column Comment
   - Presto complex type read support
   -  SI global sort support

 *2. The tag to be voted upon* : apache-carbondata-2.1.0-rc2
<https://github.com/apache/carbondata/tree/apache-carbondata-2.1.0-rc2>

Commit: 52b5a2a08b00a4c7cdf34801201f4fa5393b3700
<https://github.com/apache/carbondata/commit/52b5a2a08b00a4c7cdf34801201f4fa5393b3700>

*3. The artifacts to be voted on are located here:*
https://dist.apache.org/repos/dist/dev/carbondata/2.1.0-rc2/

*4. A staged Maven repository is available for review at:*
https://repository.apache.org/content/repositories/orgapachecarbondata-1065

*5. Release artifacts are signed with the following key:*
https://people.apache.org/keys/committer/kunalkapoor.asc


Please vote on releasing this package as Apache CarbonData 2.1.0,
The vote will be open for the next 72 hours and passes if a majority of at
least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache CarbonData 2.1.0

[ ] 0 I don't feel strongly about it, but I'm okay with the release

[ ] -1 Do not release this package because...


Regards,
Kunal Kapoor


Re: [ANN] Indhumathi as new Apache CarbonData committer

2020-10-07 Thread Kunal Kapoor
Congratulations Indhumathi

Regards
Kunal Kapoor

On Wed, 7 Oct 2020, 8:32 pm Indhumathi M,  wrote:

> Thank you all
>
> Regards
> Indhumathi
>
> On Wed, Oct 7, 2020 at 8:21 PM Ravindra Pesala 
> wrote:
>
> > Congrats Indumathi !
> >
> >
> > On Wed, 7 Oct 2020 at 10:29 AM, manish gupta 
> > wrote:
> >
> > > Congratulations Indumathi
> > >
> > > Regards
> > > Manish Gupta
> > >
> > > On Wed, 7 Oct 2020 at 10:23 AM, brijoobopanna <
> brijoobopa...@huawei.com>
> > > wrote:
> > >
> > > > Congrats Indhumathi, best of luck for your new role in the community
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from:
> > > >
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > > >
> > >
> > --
> > Thanks & Regards,
> > Ravi
> >
>


[VOTE] Apache CarbonData 2.1.0(RC1) release

2020-10-04 Thread Kunal Kapoor
Hi All,

I submit the Apache CarbonData 2.1.0(RC1) for your vote.


*1.Release Notes:*
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12347868&styleName=&projectId=12320220&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED_e759c117bdddcf70c718e535d9f3cea7e882dda3_lout

*Some key features and improvements in this release:*

   - Support Float and Decimal in the Merge Flow
   - Implement delete and update feature in carbondata SDK.
   - Support array with SI
   - Support IndexServer with Presto Engine
   - Insert from stage command support partition table.
   - Implementing a new Reindex command to repair the missing SI Segments
   - Support Change Column Comment

 *2. The tag to be voted upon* : apache-carbondata-2.1.0-rc1
<https://github.com/apache/carbondata/tree/apache-carbondata-2.1.0-rc1>

Commit: acef2998bcdd10204cdabf0dcdb123bbd264f48d
<https://github.com/apache/carbondata/commit/acef2998bcdd10204cdabf0dcdb123bbd264f48d>

*3. The artifacts to be voted on are located here:*
https://dist.apache.org/repos/dist/dev/carbondata/2.1.0-rc1/

*4. A staged Maven repository is available for review at:*
https://repository.apache.org/content/repositories/orgapachecarbondata-1064


Please vote on releasing this package as Apache CarbonData 2.1.0,
The vote will be open for the next 72 hours and passes if a majority of at
least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache CarbonData 2.1.0

[ ] 0 I don't feel strongly about it, but I'm okay with the release

[ ] -1 Do not release this package because...


Regards,
Kunal Kapoor


Re: Clean files enhancement

2020-09-27 Thread Kunal Kapoor
+1 for ravi's comment
Better to show what would be deleted/moved to trash.


Regards,
Kunal Kapoor

On Thu, Sep 24, 2020 at 8:34 PM Ravindra Pesala 
wrote:

> Hi Vikram,
>
> +1
>
> It is good to remove the automatic cleanup.
> But I am still worried about the clean file command executed by user as
> well.  We need to enhance the clean file command to introduce dry run to
> print what segments it is going to be deleted and what is left. If user ok
> with dry run result then he can go for actual run.
>
> Regards,
> Ravindra.
>
> On Mon, 21 Sep 2020 at 1:27 PM, Vikram Ahuja 
> wrote:
>
> > Hi Ravi and David,
> >
> >
> >
> > 1. All the automatic clean data in the case of load/insert/compact/delete
> >
> > will be removed, so cleaning will only happen when the clean files
> command
> >
> > is called.
> >
> >
> >
> > 2. We will only add the data to trash when we try to clean data which is
> in
> >
> > IN PROGRESS state. In case of COmpacted/Marked For Delete it will not be
> >
> > moved to the trash, it will be directly deleted. The user will only be
> able
> >
> > to recover the In Progress segments if the user wants. @Ravi -> Is this
> >
> > okay for trash usage? Only using it for in progress segments.
> >
> >
> >
> > 3. No trash management will be implemented, the data will ONLY BE REMOVED
> >
> > from the trash folder immediately when the clean files command is called.
> >
> > There will be no time to live, the data can be kept in the trash folder
> >
> > untill the user triggers clean files command.
> >
> >
> >
> > Let me know if you have any questions.
> >
> >
> >
> > Vikram Ahuja
> >
> >
> >
> > On Fri, Sep 18, 2020 at 1:43 PM David CaiQiang 
> > wrote:
> >
> >
> >
> > > agree with Ravindra,
> >
> > >
> >
> > > 1. stop all automatic clean data in
> load/insert/compact/update/delete...
> >
> > >
> >
> > > 2. when clean files command clean in-progress or uncertain data, we can
> >
> > > move
> >
> > > them to data trash.
> >
> > > it can prevent delete useful data by mistake, we already find this
> >
> > > issue
> >
> > > in some scenes.
> >
> > > other cases(for example clean mark_for_delete/compacted segment)
> > should
> >
> > > not use the data trash folder, clean data directly.
> >
> > >
> >
> > > 3. no need data trash management, suggest keeping it simple.
> >
> > > The clean file command should support empty trash immediately, it
> > will
> >
> > > be enough.
> >
> > >
> >
> > >
> >
> > >
> >
> > > -
> >
> > > Best Regards
> >
> > > David Cai
> >
> > > --
> >
> > > Sent from:
> >
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
> > >
> >
> >
>
> --
> Thanks & Regards,
> Ravi
>


Re: [Discussion] Segment management enhance

2020-09-04 Thread Kunal Kapoor
Hi David,
Then better we keep a mapping for the segment UUID to virtual segment
number in the table status file as well,
Any API through which the user can get the segment details should return
the virtual segment id instead of the UUID.

On Fri, Sep 4, 2020 at 12:59 PM David CaiQiang  wrote:

> Hi Kunal,
>
>1. The user uses SQL API or other interfaces. This UUID is a transaction
> id, and we already stored the timestamp and other informations in the
> segment metadata.
>This transaction id can be used in the loading/compaction/update
> operation. We can append this id into the log if needed.
>Git commit id also uses UUID, so we can consider to use it. What
> information do you want to get from the folder name?
>
>2. It is easy to fix the show segment command's issue. Maybe we can sort
> segment by timestamp and UUID to generate the index id.  The user can
> continue to use it in other commands.
>
>
>
> -
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [Discussion] Segment management enhance

2020-09-03 Thread Kunal Kapoor
Hi David,
I don't think changing the segment ID to UUID is a good idea, it will cause
usability issues.

1. Seeing a UUID named directory in the table structure would be weird, and
not informative.
2. The show segments command would also have the same problem.

Thanks
Kunal Kapoor

On Fri, Sep 4, 2020 at 8:38 AM David CaiQiang  wrote:

> [Background]
> 1. In some scenes, two loading/compaction jobs maybe write data to the same
> segment, it will result in some data confusion and impact some features
> which will not work fine again.
> 2. Loading/compaction/update/delete operations need  to clean stale data
> before execution. Cleaning stale data is a high-risk operation, if it has
> some exception, it will delete valid data. If the system doesn't clean
> stale
> data,   in some scenes, it will be added into a new merged index file and
> can be queried.
> 3. Loading/compaction takes a long time and lock will keep a long time also
> in some scenes.
>
> [Motivation & Goal]
> We should avoid data confusion and the risk of clean stale data. Maybe we
> can use UUID as a segment id to avoid these troubles. Even if we can do
> loading/compaction without the segment/compaction lock.
>
> [Modification]
> 1. segment id
>   Using UUID as segment id instead of the unique numeric value.
>
> 2. segment layout
>  a) move segment data folder into the table folder
>  b) move carbonindexmerge file into Metadata/segments folder,
>
>  tableFolder
> UUID1
>  |_xxx.carbondata
>  |_xxx.carobnindex
> UUID2
> Metadata
>  |_segemnts
> |_UUID1_timestamp1.segment (segment index summary)
> |_UUID1_timestamp1.carbonindexmerge (segment index detail)
>  |_schema
>  |_tablestatus
> LockFiles
>
>   partitionTableFolder
> partkey=value1
>  |_xxx.carbondata
>  |_xxx.carobnindex
> partkey=value2
> Metadata
>  |_segemnts
> |_UUID1_timestamp1.segment (segment index summary)
> |_partkey=value1
>   |_UUID1_timestamp1.carbonindexmerge (segment index detail)
> |_partkey=value2
>  |_schema
>  |_tablestatus
> LockFiles
>
> 3. segment management
> Extracting segment interface, it can support open/close, read/write, and
> segment level index pruning API.
> The segment should support multiple data source types: file format(carbon,
> parquet, orc...), HBase...
>
> 4. clean stale data
> it will become an optional operation.
>
>
>
> -
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

2020-07-27 Thread Kunal Kapoor
+1,
It would be great to have write support from presto

Thanks
Kunal Kapoor

On Tue, Jul 14, 2020 at 6:08 PM Akash Nilugal 
wrote:

> Hi Community,
>
> As we know the CarbonDataisan indexed columnar data format for fast
> analytics on big data platforms. So
> we have already integrated with the query engines like spark and even
> presto. Currently with presto we
> only support the querying of carbon data files. But we don’t yet support
> the writing of carbon data files
> through presto engine.
>
> Currentlypresto is integrated with carbondata for reading the carbondata
> files via presto.
> For this, we should be having the store already ready which may be written
> carbon in spark and the table
> should be hive metastore. So using carbondata connector we are able to read
> the carbondata files. But we
> cannot create a table or load the data to the table in presto. So it will
> somewhat hectic job to read the
> carbon files, by writing first with other engines.
>
> So here I will be trying to support the transactional load support in
> presto integration for carbon.
>
> I have attached the design document in the Jira, please refer and any
> suggestions or input is most welcome.
>
> https://issues.apache.org/jira/browse/CARBONDATA-3831
>
>
> Regards,
> Akash R.
>


Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

2020-07-09 Thread Kunal Kapoor
Agree with david on always using merge index.

+1 for deprecating the property.

On Thu, Jul 9, 2020 at 3:32 PM Ajantha Bhat  wrote:

> Hi, What if too many index files in a segment and user want to finish load
> fast and don't want to wait for merge index?
>
> That time setting merge index = false can help to save load time and in off
> peak time user can create merge index.
>
> So I still feel we need to fix issue exist when merge index = false.
>
> Thanks,
> Ajantha
>
> On Thu, 9 Jul, 2020, 3:05 pm David CaiQiang,  wrote:
>
> > Better to always merge index.
> >
> > -1 for 1,
> >
> > +1 for 2,
> >
> >
> >
> > -
> > Best Regards
> > David Cai
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>


Re: [VOTE] Apache CarbonData 2.0.1(RC1) release

2020-06-01 Thread Kunal Kapoor
Hi all

PMC vote has passed for Apache Carbondata 2.0.1 release, the result as
below:

+1(binding): 4(Jacky, David CaiQiang, Liang Chen, Zhichao Zhang)

+1(non-binding) : 1


Thanks all for your vote.


On Mon, Jun 1, 2020 at 7:20 PM David CaiQiang  wrote:

> +1
>
>
>
> -
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


[VOTE] Apache CarbonData 2.0.1(RC1) release

2020-06-01 Thread Kunal Kapoor
Hi All,

I submit the Apache CarbonData 2.0.1(RC1) for your vote.


*1.Release Notes:*
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12347870

 *2. The tag to be voted upon* :
apache-carbondata-2.0.1-rc1
<https://github.com/apache/carbondata/tree/apache-carbondata-2.0.1-rc1>

*3. The artifacts to be voted on are located here:*
https://dist.apache.org/repos/dist/dev/carbondata/2.0.1-rc1/

*4. A staged Maven repository is available for review at:*
https://repository.apache.org/content/repositories/orgapachecarbondata-1063/

*5. Release artifacts are signed with the following key:*
https://people.apache.org/keys/committer/kunalkapoor.asc

Please vote on releasing this package as Apache CarbonData 2.0.1,
The vote will be open for the next 4 hours because this is a patch release
and passes if a majority of at least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache CarbonData 2.0.1

[ ] 0 I don't feel strongly about it, but I'm okay with the release

[ ] -1 Do not release this package because...


Regards,
Kunal Kapoor


Re: [DISCUSSION] About global sort in 2.0.0

2020-05-31 Thread Kunal Kapoor
+1
We can have 2.0.1 as the patch release.

Regards
Kunal Kapoor

On Mon, Jun 1, 2020 at 9:09 AM Jacky Li  wrote:

> Hi All,
>
> In CarbonData version 2.0.0, there is a bug that making global-sort using
> incorrect sort value when sorting column is String.
> This is impacting all existing global-sort table when doing new loading or
> insert into.
>
> So I suggest community should have a patch release to fix this bug ASAP.
> For 2.0.0 version, global-sort on String column is not recommended to use.
>
> Regards,
> Jacky


Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-20 Thread Kunal Kapoor
Hi all

PMC vote has passed for Apache Carbondata 2.0.0 release, the result as
below:


+1(binding): 6(Jacky, Kumar Vishal, Ravindra, David CaiQiang, Liang Chen,
Zhichao Zhang)


+1(non-binding) : 3


Thanks all for your vote.

On Wed, May 20, 2020 at 4:07 PM Kunal Kapoor 
wrote:

> Hi all
>
> PMC vote has passed for Apache Carbondata 2.0.0 release, the result as
> below:
>
>
> +1(binding): 5(Jacky, Kumar Vishal, Ravindra, David CaiQiang, Liang Chen)
>
>
> +1(non-binding) : 4
>
>
> Thanks all for your vote.
>
> On Tue, May 19, 2020 at 7:24 AM 爱在西元前 <371684...@qq.com> wrote:
>
>> +1
>>
>>
>>
>>
>> -- 原始邮件 --
>> 发件人: "Kunal Kapoor"> 发送时间: 2020年5月17日(星期天) 下午5:20
>> 收件人: "dev">
>> 主题: [VOTE] Apache CarbonData 2.0.0(RC3) release
>>
>>
>>
>> Hi All,
>>
>> I submit the Apache CarbonData 2.0.0(RC3) for your vote.
>>
>>
>> *1.Release Notes:*
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346046&styleName=Html&projectId=12320220
>>
>> *Some key features and improvements in this release:*
>>
>>    - Adapt to SparkSessionExtensions
>>    - Support integration with spark 2.4.5
>>    - Support heterogeneous format segments in carbondata
>>    - Support write Flink streaming data to Carbon
>>    - Insert from stage command support partition table.
>>    - Support secondary index on carbon table
>>    - Support query of stage files
>>    - Support TimeBased Cache expiration using ExpiringMap
>>    - Improve insert into performance and decrease memory foot
>> print
>>    - Support PyTorch and TensorFlow
>>
>>  *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc3
>> <https://github.com/apache/carbondata/tree/apache-carbondata-2.0.0-rc3>
>> ;
>>
>> Commit: 29d78b78095ad02afde750d89a0e44f153bcc0f3
>> <
>> https://github.com/apache/carbondata/commit/29d78b78095ad02afde750d89a0e44f153bcc0f3>
>> ;
>>
>> *3. The artifacts to be voted on are located here:*
>> https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc3/
>>
>> *4. A staged Maven repository is available for review at:*
>>
>> https://repository.apache.org/content/repositories/orgapachecarbondata-1062/
>>
>> *5. Release artifacts are signed with the following key:*
>> https://people.apache.org/keys/committer/kunalkapoor.asc
>>
>>
>> Please vote on releasing this package as Apache CarbonData 2.0.0,
>> The vote will be open for the next 72 hours and passes if a majority of at
>> least three +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache CarbonData 2.0.0
>>
>> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>>
>> [ ] -1 Do not release this package because...
>>
>>
>> Regards,
>> Kunal Kapoor
>
>


Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-20 Thread Kunal Kapoor
Hi all

PMC vote has passed for Apache Carbondata 2.0.0 release, the result as
below:


+1(binding): 5(Jacky, Kumar Vishal, Ravindra, David CaiQiang, Liang Chen)


+1(non-binding) : 4


Thanks all for your vote.

On Tue, May 19, 2020 at 7:24 AM 爱在西元前 <371684...@qq.com> wrote:

> +1
>
>
>
>
> -- 原始邮件 ----------
> 发件人: "Kunal Kapoor" 发送时间: 2020年5月17日(星期天) 下午5:20
> 收件人: "dev"
> 主题: [VOTE] Apache CarbonData 2.0.0(RC3) release
>
>
>
> Hi All,
>
> I submit the Apache CarbonData 2.0.0(RC3) for your vote.
>
>
> *1.Release Notes:*
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346046&styleName=Html&projectId=12320220
>
> *Some key features and improvements in this release:*
>
>    - Adapt to SparkSessionExtensions
>    - Support integration with spark 2.4.5
>    - Support heterogeneous format segments in carbondata
>    - Support write Flink streaming data to Carbon
>    - Insert from stage command support partition table.
>    - Support secondary index on carbon table
>    - Support query of stage files
>    - Support TimeBased Cache expiration using ExpiringMap
>    - Improve insert into performance and decrease memory foot
> print
>    - Support PyTorch and TensorFlow
>
>  *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc3
> <https://github.com/apache/carbondata/tree/apache-carbondata-2.0.0-rc3>;
>
> Commit: 29d78b78095ad02afde750d89a0e44f153bcc0f3
> <
> https://github.com/apache/carbondata/commit/29d78b78095ad02afde750d89a0e44f153bcc0f3>
> ;
>
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc3/
>
> *4. A staged Maven repository is available for review at:*
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1062/
>
> *5. Release artifacts are signed with the following key:*
> https://people.apache.org/keys/committer/kunalkapoor.asc
>
>
> Please vote on releasing this package as Apache CarbonData 2.0.0,
> The vote will be open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache CarbonData 2.0.0
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
> Regards,
> Kunal Kapoor


Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-18 Thread Kunal Kapoor
Hi David,
Java-docs are created only for java classes, and mv-plan does not have any
java classes therefore there is no doc jar.

On Mon, May 18, 2020 at 12:13 PM David CaiQiang 
wrote:

> hi, Kunal
>
>   another question about point 4:
> *4. A staged Maven repository is available for review at:*
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1062/
>
> please check this URL, it doesn't contain javadoc jar.
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1062/org/apache/carbondata/carbondata-mv-plan_2.3/2.0.0/
>
>
>
> -
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-17 Thread Kunal Kapoor
Hi David,
Yeah you are right. I will remove the 2.4 source zip and other files for
the same.

Thanks
Kunal Kapoor

On Mon, May 18, 2020 at 8:25 AM David CaiQiang  wrote:

> I have a doubt about point 3:
> *3. The artifacts to be voted on are located here:*
> https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc3/
>
> why need to release two same source packages?
>
> apache-carbondata-2.0.0-spark2.3-source-release.zip
> apache-carbondata-2.0.0-spark2.4-source-release.zip
>
> They are the same, besides of zip file name.
>
>
>
> -
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


[VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-17 Thread Kunal Kapoor
Hi All,

I submit the Apache CarbonData 2.0.0(RC3) for your vote.


*1.Release Notes:*
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346046&styleName=Html&projectId=12320220

*Some key features and improvements in this release:*

   - Adapt to SparkSessionExtensions
   - Support integration with spark 2.4.5
   - Support heterogeneous format segments in carbondata
   - Support write Flink streaming data to Carbon
   - Insert from stage command support partition table.
   - Support secondary index on carbon table
   - Support query of stage files
   - Support TimeBased Cache expiration using ExpiringMap
   - Improve insert into performance and decrease memory foot print
   - Support PyTorch and TensorFlow

 *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc3
<https://github.com/apache/carbondata/tree/apache-carbondata-2.0.0-rc3>

Commit: 29d78b78095ad02afde750d89a0e44f153bcc0f3
<https://github.com/apache/carbondata/commit/29d78b78095ad02afde750d89a0e44f153bcc0f3>

*3. The artifacts to be voted on are located here:*
https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc3/

*4. A staged Maven repository is available for review at:*
https://repository.apache.org/content/repositories/orgapachecarbondata-1062/

*5. Release artifacts are signed with the following key:*
https://people.apache.org/keys/committer/kunalkapoor.asc


Please vote on releasing this package as Apache CarbonData 2.0.0,
The vote will be open for the next 72 hours and passes if a majority of at
least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache CarbonData 2.0.0

[ ] 0 I don't feel strongly about it, but I'm okay with the release

[ ] -1 Do not release this package because...


Regards,
Kunal Kapoor


[VOTE] Apache CarbonData 2.0.0(RC2) release

2020-04-30 Thread Kunal Kapoor
Hi All,

I submit the Apache CarbonData 2.0.0(RC2) for your vote.


*1.Release Notes:*
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346046&styleName=Html&projectId=12320220

*Some key features and improvements in this release:*

   - Adapt to SparkSessionExtensions
   - Support integration with spark 2.4.5
   - Support heterogeneous format segments in carbondata
   - Support write Flink streaming data to Carbon
   - Insert from stage command support partition table.
   - Support secondary index on carbon table
   - Support query of stage files
   - Support TimeBased Cache expiration using ExpiringMap
   - Improve insert into performance and decrease memory foot print

 *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc2
<https://github.com/apache/carbondata/tree/apache-carbondata-2.0.0-rc2>

Commit: 5006c83094b642f337e125f2804a3e187df8e4a7
<https://github.com/apache/carbondata/commit/5006c83094b642f337e125f2804a3e187df8e4a7>
)

*3. The artifacts to be voted on are located here:*
https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc2/

*4. A staged Maven repository is available for review at:*
https://repository.apache.org/content/repositories/orgapachecarbondata-1061/

*5. Release artifacts are signed with the following key:*
https://people.apache.org/keys/committer/kunalkapoor.asc


Please vote on releasing this package as Apache CarbonData 2.0.0,
The vote will be open for the next 72 hours and passes if a majority of at
least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache CarbonData 2.0.0

[ ] 0 I don't feel strongly about it, but I'm okay with the release

[ ] -1 Do not release this package because...


Regards,
Kunal Kapoor


[VOTE] Apache CarbonData 2.0.0(RC1) release

2020-04-01 Thread Kunal Kapoor
Hi All,

I submit the Apache CarbonData 2.0.0(RC1) for your vote.


*1.Release Notes:*
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12346046

*Some key features and improvements in this release:*

   - Adapt to SparkSessionExtensions
   - Support integration with spark 2.4.5
   - Support heterogeneous format segments in carbondata
   - Support write Flink streaming data to Carbon
   - Insert from stage command support partition table.
   - Support secondary index on carbon table
   - Support query of stage files
   - Support TimeBased Cache expiration using ExpiringMap
   - Improve insert into performance and decrease memory foot print

 *2. The tag to be voted upon* : apache-carbondata-2.0.0-rc1
<https://github.com/apache/carbondata/tree/apache-carbondata-2.0.0-rc1>

Commit: a906785f73f297b4a71c8aaeabae82ae690fb1c3
<https://github.com/apache/carbondata/commit/a906785f73f297b4a71c8aaeabae82ae690fb1c3>
)

*3. The artifacts to be voted on are located here:*
https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc1/

*4. A staged Maven repository is available for review at:*
https://repository.apache.org/content/repositories/orgapachecarbondata-1060/

*5. Release artifacts are signed with the following key:*
https://people.apache.org/keys/committer/kunalkapoor.asc


Please vote on releasing this package as Apache CarbonData 2.0.0,  The
vote will
be open for the next 72 hours and passes if a majority of at least three +1
PMC votes are cast.

[ ] +1 Release this package as Apache CarbonData 2.0.0

[ ] 0 I don't feel strongly about it, but I'm okay with the release

[ ] -1 Do not release this package because...


Regards,
Kunal Kapoor


Re: Propose to upgrade hive version to 3.1.0

2020-02-21 Thread Kunal Kapoor
Hi Ajantha,
I dont think multiple versions are needed. Still we can discuss and
conclude if we need to support multiple

On Fri, Feb 21, 2020 at 4:42 PM Ajantha Bhat  wrote:

> +1,
>
> The current version will still be supported or carbondata will only support
> 3.1.0 after this?
>
> Thanks,
> Ajantha
>
> On Fri, 21 Feb, 2020, 4:39 pm Kunal Kapoor, 
> wrote:
>
> > Hi All,
> >
> > The hive community has already released version 3.1.0 which has a lot of
> > bug fixes and new features.
> > Many of the users have already migrated to 3.1.0 in their production
> > environment and i think its time we should also upgrade the hive-carbon
> > integration to this version.
> >
> > Please go through the release notes
> > <
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343014&styleName=Text&projectId=12310843
> > >
> > for the list of improvements and bug fixes.
> >
> > Regards
> > Kunal Kapoor
> >
>


Propose to upgrade hive version to 3.1.0

2020-02-21 Thread Kunal Kapoor
Hi All,

The hive community has already released version 3.1.0 which has a lot of
bug fixes and new features.
Many of the users have already migrated to 3.1.0 in their production
environment and i think its time we should also upgrade the hive-carbon
integration to this version.

Please go through the release notes
<https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343014&styleName=Text&projectId=12310843>
for the list of improvements and bug fixes.

Regards
Kunal Kapoor


Re: [Discussion] Support Secondary Index on Carbon Table

2020-02-05 Thread Kunal Kapoor
+1

Thanks
Kunal Kapoor

On Wed, Feb 5, 2020, 5:33 PM Indhumathi M  wrote:

> Hi Community,
>
> Currently we have datamaps like,* default datamaps* which are block and
> blocklet and *coarse grained datamaps* like bloom, and *fine grained
> datamaps* like lucene
> which helps in better pruning during query. What if we introduce another
> kind of datamap which can hold blockletId as index? Initial level, we call
> it as index which
> will work as a child table to the main table like we have MV in our current
> code.
>
> Yes, lets introduce the secondary index to carbon table which will be the
> child table to main table and it can be created on column like we create
> lucene datamap,
> where we give index columns to create index. In a similar way, we create
> secondary index on column, so indexes on these column will be blocklet IDs
> which will
> help in better pruning and faster query when we have a filter query on the
> index column.
>
> Currenlty we will take it as index table and then later part we will make
> it inline to datamap interface.
>
> So design document is attached in JIRA, please give your suggestion/inputs.
>
> JIRA Link: CARBONDATA-3680
> <https://issues.apache.org/jira/browse/CARBONDATA-3680>
>
> Thanks & Regards,
> Indhumathi M
>


[DISCUSSION] Hive and Presto Write support + Performance improvement

2020-01-08 Thread Kunal Kapoor
Hi All,
As you all know that carbon has been supporting reading carbontable from
presto and hive for a long time now and its high time that we start
supporting write from presto and hive in 2.0.0 version.

The development would be divided into 2 Phases.

*Phase1 (Hive):*
*1. Support a OutputFormat(MapredCarbonOutputFormat) that allows the user
to write data in carbondata format from hive.*
- Tables would be created in spark, until a solution to create schema
file in hive is found.
- Tables would support the same folder structure as a transactional
table.
- Any carbon specific DDL/DML would not be supported.

*2. Read Performance should be better or equivalent to ORC.*

*Phase2 (Presto): To be done later*
The Tasks are same as Hive and any update to the task list would be updated
after analysis.

Any suggestions from the community is appreciated.

Thanks
Kunal Kapoor


Re: [DISCUSSION]: Changes to SHOW METACACHE command

2020-01-05 Thread Kunal Kapoor
Hi Vikram,
I think the proposed change will bring more clarity to the show cache
output.

I have a suggestion, let us have a command which can show the cache
occupied by each of the executors(only applicable for index server).
Something like *show executor metacache*, where the "executor" keyword is
optional.
This command will not show per table, only per node

Example:
Driver - X bytes
Executor1 - Y bytes
Executor2 - Z bytes

Thanks
Kunal Kapoor

On Fri, Jan 3, 2020 at 11:11 AM Akash Nilugal 
wrote:

> hi,
>
> +1,
> Looks good.
> Can you please update the document with more clear explanation.
>
> Regards,
> Akash
>
> On 2019/12/17 11:55:30, Vikram Ahuja  wrote:
> > Hi All,
> > Please find the attached design document for the same.
> >
> >
> https://docs.google.com/document/d/1qbr8-Ci_tCvh1tuEdxo3xJkLhDksdnfKuZiDtQCUujk/edit?usp=sharing
> >
> > On Tue, Nov 26, 2019 at 10:28 PM Kunal Kapoor 
> > wrote:
> >
> > > Hi Vikram,
> > > What is the background for these changes and what are the benefits this
> > > will add to carbondata?
> > > Better to add a detailed design document in this thread.
> > >
> > > Thanks
> > > Kunal Kapoor
> > >
> > > On Tue, Nov 26, 2019 at 7:01 PM vikramahuja1001 <
> vikramahuja8...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Current result of Show Metacache command:
> > > > <
> > > >
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t423/currentSM.png
> > > >
> > > >
> > > >
> > > > Proposed result of Show metacache command for the Driver and the
> Index
> > > > Server:
> > > > <
> > > >
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t423/proposedSM.png
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from:
> > > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > > >
> > >
> >
>


Re: Apply to open 'Issues' tab in Apache CarbonData github

2019-12-30 Thread Kunal Kapoor
Hi,
+1 for Ajantha's suggestion.

The use of issues tab is not clear from this post. Please give details on
how you are planning to use it.

Thanks
Kunal Kapoor


On Mon, Dec 30, 2019 at 2:03 PM Akash Nilugal 
wrote:

> Hi,
>
> +1 for ajantha's suggestion.
>
> Many Open source communities use slack for any discussion, it has got good
> interface UI and it even supports chat. It will be helpful for any new
> developer who is interested in carbondata.
> Currently many people cannot follow mail chain.
>
> Regards,
> Akash R Nilugal
>
> On 2019/12/24 05:33:22, Ajantha Bhat  wrote:
> > If planning to issues tab just to replace mailing list problems. I would
> > suggest we can start using "*slack*".
> >
> > Many companies and open source communities uses slack (I have used from
> > presto sql community). It supports thread based conversations and
> searching
> > is easy. It also provides option to create multiple channels and it works
> > in china without any vpn.
> >
> > Please have a look at it once.
> >
> > Thanks,
> > Ajantha
> >
> > On Mon, 23 Dec, 2019, 11:31 pm 恩爸, <441586...@qq.com> wrote:
> >
> > > Hi Liang:
> > >   Carbondata users can raise issues in github's issues to ask any
> > > questions, the function is the same as mailing list, many chinese users
> > > can't access mailing list and are used to use github's issues.
> > >   To track and record real issues, it still needs to use Apache JIRA.
> > >
> > >
> > > -- Original --
> > > From: "Liang Chen [via Apache CarbonData Dev Mailing List archive]"<
> > > ml+s1130556n88594...@n5.nabble.com>;
> > > Date: Sun, Dec 22, 2019 09:14 PM
> > > To: "恩爸"<441586...@qq.com>;
> > >
> > > Subject: Re: Apply to open 'Issues' tab in Apache CarbonData github
> > >
> > >
> > >
> > > Hi
> > >
> > > +1 from my side.
> > > One question : what issues should be raised to Apache JIRA? what issues
> > > will
> > > be raised to github's issue ?
> > > It is better to give the clear definition.
> > >
> > > Regards
> > > Liang
> > >
> > >
> > > xm_zzc wrote
> > > > Hi community:
> > > >   I suggest community to open 'Issues' tab in carbondata github
> > > page, we
> > > > can use this feature to collect the information of carbondata users,
> > > like
> > > > this:
> > > https://github.com/apache/incubator-shardingsphere/issues/234 ,
> > > > users can add company information which uses carbondata on it
> > > willingly
> > > > and we can add these info in Carbondata website.
> > > >   what do you think about this?
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Sent from:
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > >
> > >
> > >
> > > If you reply to this email, your message will
> be
> > > added to the discussion below:
> > >
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Apply-to-open-Issues-tab-in-Apache-CarbonData-github-tp88436p88594.html
> > >
> > > To start a new topic under Apache CarbonData
> Dev
> > > Mailing List archive, email ml+s1130556n1...@n5.nabble.com
> > > To unsubscribe from Apply to open 'Issues' tab in
> Apache
> > > CarbonData github, click here.
> > > NAML
> >
>


Re: [DISCUSSION]: Changes to SHOW METACACHE command

2019-11-26 Thread Kunal Kapoor
Hi Vikram,
What is the background for these changes and what are the benefits this
will add to carbondata?
Better to add a detailed design document in this thread.

Thanks
Kunal Kapoor

On Tue, Nov 26, 2019 at 7:01 PM vikramahuja1001 
wrote:

> Current result of Show Metacache command:
> <
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t423/currentSM.png>
>
>
> Proposed result of Show metacache command for the Driver and the Index
> Server:
> <
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t423/proposedSM.png>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-11-18 Thread Kunal Kapoor
+1


Regards
Kunal Kapoor

On Mon, Nov 18, 2019, 3:47 PM Kumar Vishal 
wrote:

> +1
> -Regards
> Kumar Vishal
>
> On Mon, Oct 21, 2019 at 5:46 PM Akash Nilugal 
> wrote:
>
> > Hi All,
> >
> > Based on further analysis with druid and influxdb, current design fails
> to
> > cover the late data arrived to load. So i have updated the design
> document
> > based on that to support late data and attached in jira. Please help to
> > review it and suggestions are welcomed.
> >
> > Regards,
> > Akash
> >
> > On 2019/09/23 13:42:48, Akash Nilugal  wrote:
> > > Hi Community,
> > >
> > > Timeseries data are simply measurements or events that are
> > > tracked,monitored, downsampled, and aggregated over time.
> > > Basicallytimeseries data analysis helps in analyzing or monitoring
> > > theaggregated data over period of time to take better decision
> > forbusiness.
> > > So since carbondata supports olap datamap like preaggregate, MV and
> since
> > > time series is of atmost importance,
> > > we can supporttimeseries for carbondata over MV datamap model.
> > >
> > > Currentlycarbondata supports timeseries on preaggregate datamap, but
> its
> > > analpha feature and there are so many limitations when we compare and
> > > analyze the existing timeseries database or projects which supportstime
> > > series like apache druid or influxdb. So, in this feature we can
> support
> > > timeseries
> > > by avoiding the limitations in the current system. After doing the
> > analysis
> > > on the current existing timeseries database like influxdb, and the
> apache
> > > druid,
> > > i have  prepared a solution/design document. Any inputs, improvements
> or
> > > suggestion are most welcome.
> > >
> > > I have created jira
> > https://issues.apache.org/jira/browse/CARBONDATA-3525 for
> > > this. Later i will create sub jiras for tracking.
> > >
> > >
> > > Regards,
> > > Akash R Nilugal
> > >
> >
>


Re: [VOTE] Apache CarbonData 1.6.1(RC1) release

2019-10-11 Thread Kunal Kapoor
+1

On Mon, Oct 7, 2019, 8:42 PM Raghunandan S <
carbondatacontributi...@gmail.com> wrote:

> Hi
>
>
> I submit the Apache CarbonData 1.6.1 (RC1) for your vote.
>
>
> 1.Release Notes:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12345993
>
>
> Some key features and improvements in this release:
>
>
>1. Supported adding segment to CarbonData table
>
>
>
>
> [Behaviour Changes]
>
>1. None
>
>
>  2. The tag to be voted upon : apache-carbondata-1.6.1-rc1 (commit:
>
> cabde6252d4a527fbfeb7f17627c6dce3e357f84)
>
>
> https://github.com/apache/carbondata/releases/tag/apache-CarbonData-1.6.1-rc1
>
>
>
> 3. The artifacts to be voted on are located here:
>
> https://dist.apache.org/repos/dist/dev/carbondata/1.6.1-rc1/
>
>
>
> 4. A staged Maven repository is available for review at:
>
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1057/
>
>
>
> 5. Release artifacts are signed with the following key:
>
>
> *https://people.apache.org/keys/committer/raghunandan.asc*
>
>
>
> Please vote on releasing this package as Apache CarbonData 1.6.1,  The vote
>
>
> will be open for the next 72 hours and passes if a majority of
>
>
> at least three +1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache CarbonData 1.6.1
>
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
>
> [ ] -1 Do not release this package because...
>
>
>
> Regards,
>
> Raghunandan.
>


Re: [DISCUSSION] Support heterogeneous format segments in carbondata

2019-10-06 Thread Kunal Kapoor
+1

On Mon, Sep 30, 2019, 2:44 PM Akash Nilugal  wrote:

> Hi
>
> +1
> One question is , is add segment and load data to main table supported? If
> yes, how the segment locking thing is handled? as we are going to add an
> entry inside table status with a segment id for added segment.
>
> Regards,
> Akash
>
> On 2019/09/10 14:41:22, Ravindra Pesala  wrote:
> > Hi All,
> >
> >  This discussion is regarding support of other formats in carbon. Already
> > existing customers use other formats like parquet, orc etc., but if they
> > want to migrate to carbon there is no proper solution at hand. So this
> > feature allows all the old data to add as a segment to carbondata .  And
> > during query, it reads old data in its respective format and all new
> > segments will be read in carbon.
> >
> > I have created the design document and attached to the jira. Please
> review
> > it.
> > https://issues.apache.org/jira/browse/CARBONDATA-3516
> >
> >
> > --
> > Thanks & Regards,
> > Ravindra
> >
>


Re: [ANNOUNCE] Ajantha as new Apache CarbonData committer

2019-10-03 Thread Kunal Kapoor
Congratulations ajantha

On Thu, Oct 3, 2019, 5:30 PM Liang Chen  wrote:

> Hi
>
>
> We are pleased to announce that the PMC has invited Ajantha as new Apache
> CarbonData committer and the invite has been accepted!
>
> Congrats to Ajantha and welcome aboard.
>
> Regards
>
> Apache CarbonData PMC
>


Re: Time travel/versioning on carbondata.

2019-08-26 Thread Kunal Kapoor
Hi Ravindra,
I have some questions regarding the feature:

1. *What would be the behaviour if the user just fires a 'select * from
table'(non-transaction query)?*
Would we still read the transaction file to get the latest tablestatus
file name or would we keep the latest transaction in cache.
My concern is that this may impact the query performance as the
transaction file grows.

2. *Would the user be able to create child datamaps like 'preaggregate',
'mv', 'bloom' with some transaction id?*
 ex: create datamap dm1 using 'mv' as select * from maintable where
'somefilter' @mmDDHHmmSS
 *I think this scenario can be blocked.*

3.* Would the user be able to clean the retention files using 'clean files'
or some new DDL would be exposed for transaction cleanup?*

4. *Impacted areas should include index server as the transaction details
have to be send to the Server for pruning.*

5. *Impacted Area Point 8: 'Alter table operations should open and close
the transaction'*
Does this mean that for each alter operation a transaction entry would
be maintained and the user can query the old schema by specifying the
transaction id before that  operation? If yes then would multiple
versions of schema files be maintained?

6. *Can the user travel both ways in the revert command?*
 First reset to an old transaction id and then come back to the latest
ID?


+1 for not removing the compacted segments immediately to maintain
transaction history.

Thanks
Kunal Kapoor


On Fri, Aug 23, 2019 at 6:08 PM Ravindra Pesala 
wrote:

> Hi All,
>
> CarbonData allows to store the data incrementally and do the Update/Delete
> operations on the stored data. But the user always can access the latest
> state of data at that point of time.
>
> In the current system, it is not possible to access the old version of
> data. And it is not possible to rollback to the old version in case of some
> issues in current version data.
>
> This proposal adds the automatic versioning of data that we store and we
> can access any historical version of that data.
>
>
> The design is attached on the jira
> https://issues.apache.org/jira/browse/CARBONDATA-3500 Please check it.
>
> --
> Thanks & Regards,
> Ravindra.
>


Re: [VOTE] Apache CarbonData 1.6.0(RC3) release

2019-08-14 Thread Kunal Kapoor
+1


Regard
Kunal Kapoor

On Wed, Aug 14, 2019, 3:02 PM Kumar Vishal 
wrote:

> +1
> Regards
> Kumar Vishal
>
> On Wed, 14 Aug 2019 at 14:59, David Cai  wrote:
>
> > +1
> >
>


Re: [VOTE] Apache CarbonData 1.5.4(RC1) release

2019-05-25 Thread Kunal Kapoor
+1

On Sun, May 26, 2019, 11:19 AM Kumar Vishal 
wrote:

> +1
> Regards
> Kumar Vishal
>
> On Sun, 26 May 2019 at 06:38, Jacky Li  wrote:
>
> > +1
> >
> > This version adds some very good features!
> >
> > Regards,
> > Jacky
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>


Re: [ANNOUNCE] Akash as new Apache CarbonData committer

2019-04-25 Thread Kunal Kapoor
Congratulations akash🥂

On Thu, Apr 25, 2019, 7:56 PM Mohammad Shahid Khan <
mohdshahidkhan1...@gmail.com> wrote:

> Congrats Akash
> Regards,
> Mohammad Shahid Khan
>
> On Thu 25 Apr, 2019, 7:51 PM Liang Chen,  wrote:
>
> > Hi all
> >
> > We are pleased to announce that the PMC has invited Akash as new
> > Apache CarbonData
> > committer, and the invite has been accepted!
> >
> > Congrats to Akash and welcome aboard.
> >
> > Regards
> > Apache CarbonData PMC
> >
> >
> > --
> > Regards
> > Liang
> >
>


Re: [DISCUSSION] Distributed Index Cache Server

2019-03-04 Thread Kunal Kapoor
Hi xuchuayin,
ok, we will be moving the pruning logic to this module.

Please give +1 to the design if you are happy with it.

Thanks
Kunal Kapoor

On Wed, Feb 20, 2019 at 6:25 PM xuchuanyin  wrote:

> Hi kunal,
>
> At last I'd suggest again that the code for pruning procedure should be
> moved to a separate module.
> The earlier we do this, the easier will be if we want to implement other
> types of IndexServer later.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-19 Thread Kunal Kapoor
Hi xuchuanyin,
I have uploaded the version2 of the design document with the desired
changes.
Please review and let me know if anything is missing or needs to be changed.

Thanks
Kunal Kapoor

On Mon, Feb 18, 2019 at 12:15 PM Kunal Kapoor 
wrote:

> Hi xuchuanyin,
> I will expose an interface and the put the same in the design document
> soon.
>
> Thanks for the feedback
> Kunal Kapoor
>
>
> On Wed, Feb 13, 2019 at 8:04 PM ChuanYin Xu 
> wrote:
>
>> Hi kunal, I think we can go further for 2.3 & 4.
>>
>> For 4, I think all functions of IndexServer should be in an individual
>> module. We can think of the IndexServer as an enhancement component for
>> Carbondata. And inside that module we handle the actual pruning logic. On
>> the other side, if we do not have this component, there will be no pruning
>> at all.
>>
>> As a consequence, for 2.3, I think the IndexServer should provide
>> interfaces that will provide pruning services. For example it accepts
>> expressions and returns pruning result.
>>
>> I think only in this way can the IndexServer be more extensible to meet
>> higher requirements.
>>
>>


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-17 Thread Kunal Kapoor
Hi xuchuanyin,
I will expose an interface and the put the same in the design document soon.

Thanks for the feedback
Kunal Kapoor


On Wed, Feb 13, 2019 at 8:04 PM ChuanYin Xu  wrote:

> Hi kunal, I think we can go further for 2.3 & 4.
>
> For 4, I think all functions of IndexServer should be in an individual
> module. We can think of the IndexServer as an enhancement component for
> Carbondata. And inside that module we handle the actual pruning logic. On
> the other side, if we do not have this component, there will be no pruning
> at all.
>
> As a consequence, for 2.3, I think the IndexServer should provide
> interfaces that will provide pruning services. For example it accepts
> expressions and returns pruning result.
>
> I think only in this way can the IndexServer be more extensible to meet
> higher requirements.
>
>


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-17 Thread Kunal Kapoor
Hi yaojinguo,
1. The  carbon.input.segments property will restrict the segments that have
to be accessed. Therefore the carbon driver will send on the specified
segments for pruning.
Why do you think that the Cache Server will slow down the query?

On Fri, Feb 15, 2019 at 2:52 PM yaojinguo  wrote:

> nice feature. I still have some questions:
> 1. what's the  impact on set carbon.input.segments command? Index Cache
> Server may make the query slower.
> 2. what's your plan of this feature.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-17 Thread Kunal Kapoor
Hi Dhatchayani,
1. The next query will take care of removing the cache for the deleted
segments. The request is designed to contain the invalid segments as well,
so that the corresponding datamaps can be removed from the cache.
2. No impact on clean files command.
3. ColumnCache will behave in the same way. If alter command is fired then
the cache would be changed accordingly.
4. This is a valid point, the query retry configuration should be disabled
so that the datamaps are caches in the assigned executor only. Even if the
query fails then the carbon driver will take care of the pruning.

Thanks
Kunal Kapoor

On Thu, Feb 14, 2019 at 11:56 PM dhatchayani 
wrote:

> Hi Kunal,
>
> This feature looks great from the design.
>
> Still, I need some more clarifications on the below points.
>
> (1) How segment deletion will be handled? Whether the next query takes care
> of clearing this segment and update the driver map or the delete operation
> will update?
> (2) Is there any impact on the CLEAN FILES command?
> (3) Is there any impact on the COLUMN_META_CACHE property? This is a
> session
> property and can be changed through ALTER command. If this property is
> altered, accordingly the current cache implementation will invalidate the
> datamap cache, in required cases.
> (4) Executor shut down/failures will be handled by the spark cluster
> manager? In between query execution, if some executor fails, then the tasks
> will be re-launched in any other available executors?
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-13 Thread Kunal Kapoor
Hi Xuchuanyin,
Thank you for the suggestion/questions.

1. You are right the only thing in the spotlight is the pruning, the
datamaps are not important because we would support all types of datamaps.
The bloom datamap line was just an example to illustrate that for bloom we
are already using distributed datamap pruning. I will re-write the same in
a better way.

2.1 We want the index server to run in a different cluster so that it is
centralised.

2.2 We had considered the possibility of using an in-memory DB but the same
problems will happen with huge split load(1 million or more). Also other
solutions like Elasticsearch which would be much faster but the
implementation would have to be done from scratch. For now we are starting
the requirement with a less error prone method because the existing pruning
logic has to be moved from driver to executor. No new logic is being
introduced. But we can surely integrate other solutions in the future.

2.3 The start and stop of index server/client is the only new interface
that will be provided, rest all the existing interfaces will be reused. Ill
update the same in the design soon.

3. Yes Index server will support multi-tenant, we are currently trying to
figure out the best way to authorise and authenticate the access for
multiple users.

4. Yes a seperate module would be create but just to start the server and
client. The other logic would not be moved to this module.

Thanks
Kunal Kapoor




On Wed, Feb 13, 2019 at 6:59 AM xuchuanyin  wrote:

> Hi Kunal,
>   IndexServer is quiet an efficient method to solve the problem of index
> cache and it's great that someone finally tries to implement this. However
> after I went through your design document, I get some questions for this
> and
> I'll explain those as following:
>
> 1. For the 'backgroud' chapter, I think actually it is the type of pruning
> (distribute-pruning or not) that matters, not the type of datamaps (default
> or bloomfilter).
>
> 2. Extensibility of the IndexServer
> 2.1 In the design document, why do you finally choose 'one more spark
> cluster' as the IndexServer?
>
> 2.2 Have you considered other types of IndexServer such as a DB, another
> in-memory storage engine or even treat the current implementation as an
> embedded IndexServer? If yes, Will the base IndexServer be enough
> extensible
> to support other them during your implementation and design?
>
> 2.3 What are the interfaces that the IndexServer will expose to offer
> service? I also didn't get this info.
>
> 3. For the IndexServer, will multiple tenants also be OK?
>
> 4. During coding, will IndexServer be in a separate module?
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-12 Thread Kunal Kapoor
Hi Manish,
Thank you for the suggestions.

1. I will add the impacted areas to the design document.
2. Yes the mapping has to be updated when a executor is down and when it
get back up the scheduling of the splits has to be done accordingly. Same
will be updated in the design.
3. I think the distribution should be based on the index files and not the
segments, so that when the user has set only 1 segment for the query even
then distribution will happen.
4. Already updated in the design

On Tue, Feb 12, 2019 at 10:18 PM manishgupta88 
wrote:

> +1
>
> 1. Add the impacted areas in design document.
> 2. If any executor goes down then update the index cache to executor
> mapping
> in driver accordingly.
> 3. Even though the cache would be divided based on index files, the minimum
> unit of cache need to be fixed. Example: 1 segment cache should belong to 1
> executor only.
> 4. One possible suggestion: Instead of reading the splits in Carbon driver,
> let each executor in index server write to a file and pass the
> List[FilePaths] to the driver and let each Carbon Executors to read from
> that path. This is the case when the number of splits exceed the threshold.
>
> Regards
> Manish Gupta
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-12 Thread Kunal Kapoor
Hi Xuchuanyin,
Uploaded the design in PDF format to the JIRA.

Thanks
Kunal Kapoor

On Tue, Feb 12, 2019 at 1:58 PM xuchuanyin  wrote:

> Hi kunal, can you attach the document directly to the jira? I cannot access
> the doc on google drive. Thanks.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


[DISCUSSION] Distributed Index Cache Server

2019-02-05 Thread Kunal Kapoor
Hi All,

Carbon currently caches all block/blocklet datamap index information into
the driver. And for bloom type of datamap, it can prune the splits in a
distributed way using distributed datamap pruning. In the first case, there
are limitations like driver memory scale up and reusing of one driver cache
by others is not possible. In the second case, there are limitations like
there is no guarantee that the next query goes to the same executor to
reuse the cache.


Based on the above problems there is a need to have a centralised index
cache server.


Please find below the link for the design document.


https://docs.google.com/document/d/161NXxrKLPucIExkWip5mX00x2iOPH6bvsuQnCzzp47E/edit?ts=5c542ab4#heading=h.x0qaehgkncz5



Thanks

Kunal Kapoor


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-05 Thread Kunal Kapoor
+ JIRA link for tracking purpose

https://issues.apache.org/jira/browse/CARBONDATA-3288

On Tue, Feb 5, 2019 at 4:36 PM Kunal Kapoor 
wrote:

> + JIRA link for tracking purpose
>
>
> On Tue, Feb 5, 2019 at 4:27 PM Kunal Kapoor 
> wrote:
>
>> Hi All,
>>
>> Carbon currently caches all block/blocklet datamap index information into
>> the driver. And for bloom type of datamap, it can prune the splits in a
>> distributed way using distributed datamap pruning. In the first case, there
>> are limitations like driver memory scale up and reusing of one driver cache
>> by others is not possible. In the second case, there are limitations like
>> there is no guarantee that the next query goes to the same executor to
>> reuse the cache.
>>
>>
>> Based on the above problems there is a need to have a centralised index
>> cache server.
>>
>>
>> Please find below the link for the design document.
>>
>>
>>
>> https://docs.google.com/document/d/161NXxrKLPucIExkWip5mX00x2iOPH6bvsuQnCzzp47E/edit?ts=5c542ab4#heading=h.x0qaehgkncz5
>>
>>
>>
>> Thanks
>>
>> Kunal Kapoor
>>
>>
>>


Re: [DISCUSSION] Distributed Index Cache Server

2019-02-05 Thread Kunal Kapoor
+ JIRA link for tracking purpose


On Tue, Feb 5, 2019 at 4:27 PM Kunal Kapoor 
wrote:

> Hi All,
>
> Carbon currently caches all block/blocklet datamap index information into
> the driver. And for bloom type of datamap, it can prune the splits in a
> distributed way using distributed datamap pruning. In the first case, there
> are limitations like driver memory scale up and reusing of one driver cache
> by others is not possible. In the second case, there are limitations like
> there is no guarantee that the next query goes to the same executor to
> reuse the cache.
>
>
> Based on the above problems there is a need to have a centralised index
> cache server.
>
>
> Please find below the link for the design document.
>
>
>
> https://docs.google.com/document/d/161NXxrKLPucIExkWip5mX00x2iOPH6bvsuQnCzzp47E/edit?ts=5c542ab4#heading=h.x0qaehgkncz5
>
>
>
> Thanks
>
> Kunal Kapoor
>
>
>


Re: [VOTE] Apache CarbonData 1.5.2(RC2) release

2019-02-01 Thread Kunal Kapoor
+1

Thanks
Kunal Kapoor

On Wed, Jan 30, 2019, 10:54 PM Raghunandan S <
carbondatacontributi...@gmail.com wrote:

> Hi
>
>
> I submit the Apache CarbonData 1.5.2 (RC2) for your vote.
>
>
> 1.Release Notes:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12344321
>
>
> Some key features and improvements in this release:
>
>
>1. Presto Enhancements like supporting Hive metastore and stabilising
> existing Presto features
>
>2. Supported Range sort for faster data loading and improved point query
> performance.
>
>3. Supported Compaction for no-sort loaded segments
>
>4. Supported rename of column names
>
>5. Supported GZIP compressor for CarbonData files.
>
>6. Supported map data type from DDL.
>
>
> [Behavior Changes]
>
>1. If user doesn’t specify sort columns during table creation, default
> sort scope is set to no-sort during data loading
>
>
>  2. The tag to be voted upon : apache-carbondata-1.5.2-rc2 (commit:
>
> 9e0ff5e4c06fecd2dc9253d6e02093f123f2e71b)
>
>
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.5.2-rc
> <
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.5.2-rc2
> >
> 2
>
>
>
> 3. The artifacts to be voted on are located here:
>
> https://dist.apache.org/repos/dist/dev/carbondata/1.5.2-rc2/
>
>
>
> 4. A staged Maven repository is available for review at:
>
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1038/
>
>
>
> 5. Release artifacts are signed with the following key:
>
>
> *https://people.apache.org/keys/committer/raghunandan.asc*
>
>
>
> Please vote on releasing this package as Apache CarbonData 1.5.2,  The vote
>
>
> will be open for the next 72 hours and passes if a majority of
>
>
> at least three +1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache CarbonData 1.5.2
>
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
>
> [ ] -1 Do not release this package because...
>
>
>
> Regards,
>
> Raghunandan.
>


Re: [DISCUSS] Move to gitbox as per ASF infra team mail

2019-01-04 Thread Kunal Kapoor
+1

On Sat, Jan 5, 2019, 8:31 AM xm_zzc <441586...@qq.com wrote:

> +1 .
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [ANNOUNCE] Chuanyin Xu as new PMC for Apache CarbonData

2019-01-01 Thread Kunal Kapoor
Congratulations

On Wed, Jan 2, 2019, 9:36 AM xm_zzc <441586...@qq.com wrote:

> Congratulation, Chuanyin Xu.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [ANNOUNCE] Bo Xu as new Apache CarbonData committer

2018-12-07 Thread Kunal Kapoor
Congratulations xubo

On Sat, Dec 8, 2018, 9:53 AM kanaka kumar avvaru <
kanakakumaravv...@gmail.com wrote:

> Congrats Xubo.
>
> -Regards
> Kanaka
>
> On Sat 8 Dec, 2018, 09:41 Raghunandan S  wrote:
>
> > Congrats xubo. Welcome on board
> >
> > On Sat, 8 Dec 2018, 8:37 am Liang Chen,  wrote:
> >
> > > Hi all
> > >
> > > We are pleased to announce that the PMC has invited Bo Xu as new
> > > Apache CarbonData
> > > committer, and the invite has been accepted!
> > >
> > > Congrats to Bo Xu and welcome aboard.
> > >
> > > Regards
> > > Apache CarbonData PMC
> > >
> >
>


Re: [VOTE] Apache CarbonData 1.5.1(RC2) release

2018-12-03 Thread Kunal Kapoor
+1

Regards
Kunal Kapoor

On Mon, Dec 3, 2018, 1:18 PM kanaka  +1
>
> I think CARBONDATA-3116 is not introduced in 1.5.1 & As Ravindra mentioned,
> If currently it is not intended for end users, we can discuss and optimize
> configuration in next version
>
> About the apache commons logging in TableDataMap.java#L78, as we verified
> our local cluster required  messages still shown in log file. So, we can
> fix
> together with other classes in next version.
>
> Regards,
> Kanaka
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


[Discussion] CarbonReader performance improvement

2018-10-28 Thread Kunal Kapoor
Hi All,
I would like to propose some improvements to CarbonReader implementation to
increase the performance.

1. When filter expression is not provided by the user then instead of
calling getSplits method we can list the carbondata files and treat one
file as one split. This would improve the performance as the time in
loading block/blocklet datamap would be avoided.

2. Implement Vectorized Reader and expose a API for the user to switch
between CarbonReader/Vectorized reader. Additionally an API would be
provided for the user to extract the columnar batch instead of rows. This
would allow the user to have a deeper integration with carbon.
Additionally the reduction in method calls for vector reader would improve
the read time.

3. Add concurrent reading functionality to Carbon Reader. This can be
enabled by passing the number of splits required by the user. If the user
passes 2 as the split for reader then the user would be returned 2
CarbonReaders with equal number of RecordReaders in each.
The user can then run each CarbonReader instance in a separate thread to
read the data concurrently.

The performance report would be shared soon.

Any suggestion from the community is greatly appreciated.

Thanks
Kunal Kapoor


Re: java.lang.NegativeArraySizeException occurred when compact

2018-10-17 Thread Kunal Kapoor
Okay sure. Please let us know the result

On Thu, Oct 18, 2018, 11:01 AM xm_zzc <441586...@qq.com> wrote:

> Hi Kunal Kapoor:
>   I have patched PR#2796 into 1.3.1 and run stream app again, this issue
> does not happen often, I will run for a few days to check whether it works.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: java.lang.NegativeArraySizeException occurred when compact

2018-10-17 Thread Kunal Kapoor
Right.
Can you try to write the segments after cherry picking #2796?

On Wed, Oct 17, 2018 at 3:21 PM xm_zzc <441586...@qq.com> wrote:

> Hi Kunal Kapoor:
>   1.  No;
>   2.  query unsuccessfully, I use Carbon SDK Reader to read that wrong
> segment and it failed too.
>   3.  the schema of the table:
>
> | rt| string
>
> | timestamp_1min| bigint
>
> | timestamp_5min| bigint
>
> | timestamp_1hour   | bigint
>
> | customer_id   | bigint
>
> | transport_id  | bigint
>
> | transport_code| string
>
> | tcp_udp   | int
>
> | pre_hdt_id| string
>
> | hdt_id| string
>
> | status| int
>
> | is_end_user   | int
>
> | transport_type| string
>
> | transport_type_nam| string
>
> | fcip  | string
>
> | host  | string
>
> | cip   | string
>
> | code  | int
>
> | conn_status   | int
>
> | recv  | bigint
>
> | send  | bigint
>
> | msec  | bigint
>
> | dst_prefix| string
>
> | next_type | int
>
> | next  | string
>
> | hdt_sid   | string
>
> | from_endpoint_type| int
>
> | to_endpoint_type  | int
>
> | fcip_view | string
>
> | fcip_country  | string
>
> | fcip_province | string
>
> | fcip_city | string
>
> | fcip_longitude| string
>
> | fcip_latitude | string
>
> | fcip_node_name| string
>
> | fcip_node_name_cn | string
>
> | host_view | string
>
> | host_country  | string
>
> | host_province | string
>
> | host_city | string
>
> | host_longitude| string
>
> | host_latitude | string
>
> | cip_view  | string
>
> | cip_country   | string
>
> | cip_province  | string
>
> | cip_city  | string
>
> | cip_longitude | string
>
> | cip_latitude  | string
>
> | cip_node_name | string
>
> | cip_node_name_cn  | string
>
> | dtp_send  | string
>
> | client_port   | int
>
> | server_ip | string
>
> | server_port   | int
>
> | state | string
>
> | response_code | int
>
> | access_domain | string
>
> | valid | int
>
> | min_batch_time| bigint
>
> | update_time   | bigint
>
> |   |
>
> | ##Detailed Table Information  |
>
> | Database Name | hdt_sys
>
> | Table Name| transport_access_log
>
> | CARBON Store Path | hdfs://hdtcluster/carbon_store
>
> | Comment   |
>
> | Table Block Size  | 512 MB
>
> | Table Data Size   | 777031634135
>
> | Table Index Size  | 72894232
>
> | Last Update Time  | 153976920
>
> | SORT_SCOPE| local_sort
>
> | Streaming | true
>
> | MAJOR_COMPACTION_SIZE | 4096
>
> | AUTO_LOAD_MERGE   | true
>
> | COMPACTION_LEVEL_THRESHOLD| 2,8
>
> |   |
>
> | ##Detailed Column property|
>
> | ADAPTIVE  |
>
> | SORT_COLUMNS  |
> is_end_user,status,customer_id,access_domai

Re: java.lang.NegativeArraySizeException occurred when compact

2018-10-17 Thread Kunal Kapoor
Hi,
I have a few question regarding this Exception:
1. Does the table have a string columns for which the length of the data is
exceeding 32k characters?
2. Are you able to query(select *) on the table successfully?
3. Can you share the schema of the table?

Meanwhile i am looking into the possibilities of any other thread clearing
the MemoryBlock.

Regards
Kunal Kapoor

On Tue, Oct 16, 2018 at 1:31 PM xm_zzc <441586...@qq.com> wrote:

> Hi Babu:
>   Thanks for your reply.
>   I set enable.unsafe.in.query.processing=false  and
> enable.unsafe.columnpage=false , and test failed still.
>   I think the issue I met is not related to MemoryBlock which is cleaned by
> some other thread. As the test steps I mentioned above, I copy the wrong
> segment and use SDKReader to read data, it failed too, the error  message
> is
> following:
> *java.lang.RuntimeException: java.lang.IllegalArgumentException
> at
>
> org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk.convertToDimColDataChunkWithOutCache(DimensionRawColumnChunk.java:120)
> at
>
> org.apache.carbondata.core.scan.result.BlockletScannedResult.fillDataChunks(BlockletScannedResult.java:355)
> at
>
> org.apache.carbondata.core.scan.result.BlockletScannedResult.hasNext(BlockletScannedResult.java:559)
> at
>
> org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.collectResultInRow(DictionaryBasedResultCollector.java:137)
> at
>
> org.apache.carbondata.core.scan.processor.DataBlockIterator.next(DataBlockIterator.java:109)
> at
>
> org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:49)
> at
>
> org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41)
> at
>
> org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:1)
> at
>
> org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator.hasNext(ChunkRowIterator.java:58)
> at
>
> org.apache.carbondata.hadoop.CarbonRecordReader.nextKeyValue(CarbonRecordReader.java:104)
> at
> org.apache.carbondata.sdk.file.CarbonReader.hasNext(CarbonReader.java:71)
> at
> cn.xm.zzc.carbonsdktest.CarbonSDKTest.main(CarbonSDKTest.java:68)
> Caused by: java.lang.IllegalArgumentException
> at java.nio.Buffer.position(Buffer.java:244)
> at
>
> org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeVariableLengthDimensionDataChunkStore.putArray(UnsafeVariableLengthDimensionDataChunkStore.java:97)
> at
>
> org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionColumnPage.(VariableLengthDimensionColumnPage.java:58)
> at
>
> org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeDimensionLegacy(CompressedDimensionChunkFileBasedReaderV3.java:325)
> at
>
> org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeDimension(CompressedDimensionChunkFileBasedReaderV3.java:266)
> at
>
> org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeColumnPage(CompressedDimensionChunkFileBasedReaderV3.java:224)
> at
>
> org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk.convertToDimColDataChunkWithOutCache(DimensionRawColumnChunk.java:118)
> ... 11 more*
>
> when error occurred, the values of some parameters in
> UnsafeVariableLengthDimensionDataChunkStore.putArray are as following :
>
> buffer.limit=192000
> buffer.cap=192000
> startOffset=300289
> numberOfRows=32000
> this.dataPointersOffsets=288000
>
> startOffset is bigger than buffer.limit, so error occurred.
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [VOTE] Apache CarbonData 1.5.0(RC2) release

2018-10-10 Thread Kunal Kapoor
+1

Regard
Kunal Kapoor

On Wed, Oct 10, 2018, 12:45 AM Ravindra Pesala 
wrote:

> Hi
>
> I submit the Apache CarbonData 1.5.0 (RC2) for your vote.
>
> 1.Release Notes:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12341006
>
> Some key features and improvements in this release:
>
>1. Supported carbon as Spark datasource using Spark's
>Fileformat interface.
>2. Supported Spark 2.3.2 version
>3. Supported Hadoop 3.1.1 version
>4. Improved compression and performance of non dictionary columns by
>applying an adaptive encoding to them,
>5. Supported MAP datatype in carbon.
>6. Supported ZSTD compression for carbondata files.
>7. Supported C++ interfaces to read carbon through SDK API.
>8. Supported CLI tool for data summary and debug purpose.
>9. Supported BYTE and FLOAT datatypes in carbon.
>10. Limited min/max for large text by introducing a configurable limit
>to avoid file size bloat up.
>11. Introduced multithread write API in SDK to speed up loading and
>query performance.
>12. Supported min/max stats for stream row format to improve query
>performance.
>13. Many Bug fixes and stabilized carbondata.
>
>
>  2. The tag to be voted upon : apache-carbondata-1.5.0.rc2(commit:
> 935cf3a5291a12a39f8c68b32157e26b8b1ef92b)
>
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.5.0-rc2
>
>
> 3. The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/carbondata/1.5.0-rc2/
>
>
> 4. A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecarbondata-1034
>
>
> 5. Release artifacts are signed with the following key:
>
> *https://people.apache.org/keys/committer/ravipesala.asc
> <
> https://link.getmailspring.com/link/1524823736.local-38e60b2f-d8f4-v1.2.1-7e744...@getmailspring.com/9?redirect=https%3A%2F%2Fpeople.apache.org%2Fkeys%2Fcommitter%2Fravipesala.asc&recipient=ZGV2QGNhcmJvbmRhdGEuYXBhY2hlLm9yZw%3D%3D
> >*
>
>
> Please vote on releasing this package as Apache CarbonData 1.5.0,  The vote
>
> will be open for the next 72 hours and passes if a majority of
>
> at least three +1 PMC votes are cast.
>
>
> [ ] +1 Release this package as Apache CarbonData 1.5.0
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
> Regards,
> Ravindra.
>


Re: Issues about dictionary and S3

2018-09-29 Thread Kunal Kapoor
Hi aaron
I have raised a PR for issue1
Can you cherry-pick the below commit and try!!!

https://github.com/apache/carbondata/pull/2786

Thanks
Kunal Kapoor

On Wed, Sep 26, 2018, 8:31 PM aaron <949835...@qq.com> wrote:

> Thanks, I've check already and it works well!  Very impressive quick
> response
> !
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [ANNOUNCE] Raghunandan as new committer of Apache CarbonData

2018-09-26 Thread Kunal Kapoor
Congratulations raghunandan

On Wed, Sep 26, 2018, 1:04 PM Ravindra Pesala  wrote:

> Congrats Raghu
>
> On Wed, 26 Sep 2018 at 12:53, sujith chacko 
> wrote:
>
> > Congratulations Raghu👍
> >
> > On Wed, 26 Sep 2018 at 12:44 PM, Rahul Kumar 
> > wrote:
> >
> > > congrats Raghunandan !!
> > >
> > >
> > > Rahul Kumar
> > > *Sr. Software Consultant*
> > > *Knoldus Inc.*
> > > m: 9555480074
> > > w: www.knoldus.com  e: rahul.ku...@knoldus.in
> > > 
> > >
> > >
> > > On Wed, Sep 26, 2018 at 12:41 PM Kumar Vishal <
> kumarvishal1...@gmail.com
> > >
> > > wrote:
> > >
> > > > Congratulations Raghunandan.
> > > >
> > > > -Regards
> > > > Kumar Vishal
> > > >
> > > > On Wed, Sep 26, 2018 at 12:36 PM Liang Chen  >
> > > > wrote:
> > > >
> > > > > Hi all
> > > > >
> > > > > We are pleased to announce that the PMC has invited Raghunandan as
> > new
> > > > > committer of Apache CarbonData, and the invite has been accepted!
> > > > >
> > > > > Congrats to Raghunandan and welcome aboard.
> > > > >
> > > > > Regards
> > > > > Apache CarbonData PMC
> > > > >
> > > >
> > >
> >
>
>
> --
> Thanks & Regards,
> Ravi
>


Re: Issues about dictionary and S3

2018-09-25 Thread Kunal Kapoor
Hi aaron,
For Issue2 can you cherry-pick
https://github.com/apache/carbondata/pull/2761 and try.
I think it should solve your problem.

Thanks

On Tue, Sep 25, 2018 at 8:34 PM aaron <949835...@qq.com> wrote:

> Thanks a lot! Looking forward to your good news.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: Issues about dictionary and S3

2018-09-25 Thread Kunal Kapoor
Hi aaron,
1. I have already started working on the issue. It requires some
refactoring as well.
2. I am trying to reproduce this issue. Will update you soon


Thanks

On Tue, Sep 25, 2018 at 3:55 PM aaron <949835...@qq.com> wrote:

> Hi kunalkapoor,
>
> Thanks very much for your quick response!
>
> 1. For the global dictionary issue, Do you have rough plan about the fix?
> 2. How's the local dictionary bug on spark 2.3.1?
>
> Looking forward to the fix!
>
> Thanks
> Aaron
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: Issues about dictionary and S3

2018-09-25 Thread Kunal Kapoor
Hi aaron,
I was able to reproduce Issue1 and it is a bug in the current
implementation of the code. This issue is not related to S3, I was able to
reproduce the same on HDFS as well.

I have created a JIRA, you can track it using the following link
https://issues.apache.org/jira/browse/CARBONDATA-2967

Thanks
Kunal Kapoor

On Mon, Sep 24, 2018 at 12:56 PM aaron <949835...@qq.com> wrote:

> Hi kunalkapoor,
>
> More info for you.
>
> *1. One comment about how to reproduce this *- query was distributed to
> spark workers on different nodes for execution.
>
> *2. Detailed stacktrace*
>
> scala> carbon.time(carbon.sql(
>  |   s"""SELECT sum(est_free_app_download), timeseries(date,
> 'MONTH'), country_code
>  |  |FROM store WHERE market_code='apple-store' and
> device_code='ios-phone' and country_code IN ('US', 'CN')
>  |  |GROUP BY timeseries(date, 'MONTH'), market_code,
> device_code, country_code,
> category_id""".stripMargin).show(truncate=false))
> 18/09/23 23:42:42 AUDIT CacheProvider:
> [ec2-dca-aa-p-sdn-16.appannie.org][hadoop][Thread-1]The key
> carbon.query.directQueryOnDataMap.enabled with value true added in the
> session param
> [Stage 0:>  (0 + 2)
> / 2]18/09/23 23:42:46 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID
> 1, 10.2.3.19, executor 1): java.lang.RuntimeException: Error while
> resolving
> filter expression
> at
>
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.resolveFilter(CarbonTable.java:1043)
> at
>
> org.apache.carbondata.core.scan.model.QueryModelBuilder.build(QueryModelBuilder.java:322)
> at
>
> org.apache.carbondata.hadoop.api.CarbonInputFormat.createQueryModel(CarbonInputFormat.java:632)
> at
>
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:419)
> at
> org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at
>
> org.apache.carbondata.core.scan.executor.util.QueryUtil.getTableIdentifierForColumn(QueryUtil.java:401)
> at
>
> org.apache.carbondata.core.scan.filter.FilterUtil.getForwardDictionaryCache(FilterUtil.java:1416)
> at
>
> org.apache.carbondata.core.scan.filter.FilterUtil.getFilterValues(FilterUtil.java:712)
> at
>
> org.apache.carbondata.core.scan.filter.resolver.resolverinfo.visitor.DictionaryColumnVisitor.populateFilterResolvedInfo(DictionaryColumnVisitor.java:60)
> at
>
> org.apache.carbondata.core.scan.filter.resolver.resolverinfo.DimColumnResolvedFilterInfo.populateFilterInfoBasedOnColumnType(DimColumnResolvedFilterInfo.java:119)
> at
>
> org.apache.carbondata.core.scan.filter.resolver.ConditionalFilterResolverImpl.resolve(ConditionalFilterResolverImpl.java:107)
> at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:255)
> at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
> at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
> at
>
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.traverseAndResolveTree(FilterExpressionProcessor.java:254)
> at
>
> o

Re: Issues about dictionary and S3

2018-09-23 Thread Kunal Kapoor
Hi aaron,
Thank you for reporting the issues. Let me have a look into these.
I will reply ASAP.


Thanks
Kunal Kapoor

On Mon, Sep 24, 2018 at 9:07 AM aaron <949835...@qq.com> wrote:

> One typo fix,  spark version of No 2 should be 2.2.1 Pre built with hadoop
> 2.7.2
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


Re: [Discussion] Support for Float and Byte data types

2018-09-18 Thread Kunal Kapoor
Yes, It will support both Session and SDK

On Fri, Sep 14, 2018 at 1:13 PM Jacky Li  wrote:

> I think your proposal will support CarbonSession also, but not only SDK
> and FileFormat, right?
>
> Regards,
> Jacky
>
> > 在 2018年9月14日,下午12:34,Kunal Kapoor  写道:
> >
> > Hi xuchuanyin,
> > Yes your understanding is correct and i agree that documentation has to
> be
> > updated to mention that for old store double data type should be used.
> > For the first phase let us focus on writing/reading through SDK and
> > FileFormat.
> > What are your thoughts?
> >
> > On Fri, Sep 14, 2018 at 7:05 AM xuchuanyin 
> wrote:
> >
> >> The actual storage datatype for that column is stored in ColumnPage
> level.
> >> In previous implementation, columns with literal datatype 'float' and
> >> 'double' shared the same storage datatype 'double' and you want to
> >> distinguish them by adding support for storage datatype 'float'.
> >>
> >> Is my understanding right? If it is so, there will no compatible
> problems.
> >> But we'd better to mention that for old store prior than 1.5.0, user can
> >> only use double.
> >>
> >>
> >>
> >> --
> >> Sent from:
> >>
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >>
> >
>
>
>
>


Re: [Discussion] Support for Float and Byte data types

2018-09-13 Thread Kunal Kapoor
Hi xuchuanyin,
Yes your understanding is correct and i agree that documentation has to be
updated to mention that for old store double data type should be used.
For the first phase let us focus on writing/reading through SDK and
FileFormat.
What are your thoughts?

On Fri, Sep 14, 2018 at 7:05 AM xuchuanyin  wrote:

> The actual storage datatype for that column is stored in ColumnPage level.
> In previous implementation, columns with literal datatype 'float' and
> 'double' shared the same storage datatype 'double' and you want to
> distinguish them by adding support for storage datatype 'float'.
>
> Is my understanding right? If it is so, there will no compatible problems.
> But we'd better to mention that for old store prior than 1.5.0, user can
> only use double.
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


[Discussion] Support for Float and Byte data types

2018-09-13 Thread Kunal Kapoor
Hi dev,
I am working on supporting Float and Byte datatypes.

*Background*
Currently float is supported by internally storing the data as double and
changing the data type to Double. This poses some problems while using
SparkCarbonFileFormat for reading the float type data.
Internally as the data type is changed from Float to Double therefore the
data is retrieved as a Double page instead of float.
If the user tried to create a table using file format by specifying the
datatype as float for any column then the query will fail. User is *restricted
to use double to retrieve the data.*

*Proposed Solution*
Add support for float data type and store the date as a FloatPage. Most of
the methods that are used for double can be reused for float.

*Similar approach can be used for Byte*.

Any suggestion from community is most welcomed.

 -Regards
Kunal Kapoor


Re: [DISCUSSION] Updates to CarbonData documentation and structure

2018-09-05 Thread Kunal Kapoor
+1, I agree with sujith.
We can have a seperate link in the website(example: *Breaking change*)
which can list all the behaviour changes that the new version is
introducing.


Regards,
Kunal Kapoor

On Wed, Sep 5, 2018 at 3:56 PM sujith chacko 
wrote:

> +1,
> As this will improve our documentation quality, just have 1 suggestions, do
> we need to have a migration guide also in carbon or a section which
> basically says any behaviour changes while migrating the carbon versions.
> This will be very handy when we release our major versions.
>
> Fo Example if any behaviour changes in bad records handling across version
> or changes in  behaviour of particular data type data like decimal
> precision/scale handling etc.
>
> Regards,
> Sujith
>
>
> On Wed, 5 Sep 2018 at 7:39 AM, xuchuanyin  wrote:
>
> > I think even we split the carbondata command into DDL and DML, it is
> still
> > too large for one document.
> >
> > For example, there are many TBLProperties for creating table in DDL. Some
> > descriptions of the TBLProperties is long and now we do not have TOC for
> > them. It's difficult to locate one property in the doc.
> >
> > Besides, some parameters can be specified in system configuration,
> > TBLProperties, LoadOptions level at the same time. Where should we
> describe
> > this parameter?
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>


Re: error occur when I load data to s3

2018-09-05 Thread Kunal Kapoor
Hi Aaron,
I tried running similar commands from my environment, Load data command was
successful.

>From analysing the logs the exception seems to be coming while lock file
creation.
Can you try the same scenario by configuring the `carbon.lock.path`
property in carbon.properties to any HDFS location:

*example:*
carbon.lock.path=hdfs://hacluster/mylockFiles

Thanks
Kunal Kapoor

On Tue, Sep 4, 2018 at 12:17 PM aaron <949835...@qq.com> wrote:

> Hi kunalkapoor, I'd like give you more debug log as below.
>
>
> application/x-www-form-urlencoded; charset=utf-8
> Tue, 04 Sep 2018 06:45:10 GMT
> /aa-sdk-test2/carbon-data/example/LockFiles/concurrentload.lock"
> 18/09/04 14:45:10 DEBUG request: Sending Request: GET
> https://aa-sdk-test2.s3.us-east-1.amazonaws.com
> /carbon-data/example/LockFiles/concurrentload.lock Headers: (Authorization:
> AWS AKIAIAQX5F5B2MLQPRGQ:Ap8rHsiPQPYUdcBb2Ojb/MA9q+I=, User-Agent:
> aws-sdk-java/1.7.4 Mac_OS_X/10.13.6
> Java_HotSpot(TM)_64-Bit_Server_VM/25.144-b01/1.8.0_144, Range: bytes=0--1,
> Date: Tue, 04 Sep 2018 06:45:10 GMT, Content-Type:
> application/x-www-form-urlencoded; charset=utf-8, )
> 18/09/04 14:45:10 DEBUG PoolingClientConnectionManager: Connection request:
> [route: {s}->https://aa-sdk-test2.s3.us-east-1.amazonaws.com:443][total
> kept
> alive: 1; route allocated: 1 of 15; total allocated: 1 of 15]
> 18/09/04 14:45:10 DEBUG PoolingClientConnectionManager: Connection leased:
> [id: 1][route:
> {s}->https://aa-sdk-test2.s3.us-east-1.amazonaws.com:443][total kept
> alive:
> 0; route allocated: 1 of 15; total allocated: 1 of 15]
> 18/09/04 14:45:10 DEBUG SdkHttpClient: Stale connection check
> 18/09/04 14:45:10 DEBUG RequestAddCookies: CookieSpec selected: default
> 18/09/04 14:45:10 DEBUG RequestAuthCache: Auth cache not set in the context
> 18/09/04 14:45:10 DEBUG RequestProxyAuthentication: Proxy auth state:
> UNCHALLENGED
> 18/09/04 14:45:10 DEBUG SdkHttpClient: Attempt 1 to execute request
> 18/09/04 14:45:10 DEBUG DefaultClientConnection: Sending request: GET
> /carbon-data/example/LockFiles/concurrentload.lock HTTP/1.1
> 18/09/04 14:45:10 DEBUG wire:  >> "GET
> /carbon-data/example/LockFiles/concurrentload.lock HTTP/1.1[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Host:
> aa-sdk-test2.s3.us-east-1.amazonaws.com[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Authorization: AWS
> AKIAIAQX5F5B2MLQPRGQ:Ap8rHsiPQPYUdcBb2Ojb/MA9q+I=[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "User-Agent: aws-sdk-java/1.7.4
> Mac_OS_X/10.13.6
> Java_HotSpot(TM)_64-Bit_Server_VM/25.144-b01/1.8.0_144[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Range: bytes=0--1[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Date: Tue, 04 Sep 2018 06:45:10
> GMT[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Content-Type:
> application/x-www-form-urlencoded; charset=utf-8[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "Connection: Keep-Alive[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  >> "[\r][\n]"
> 18/09/04 14:45:10 DEBUG headers: >> GET
> /carbon-data/example/LockFiles/concurrentload.lock HTTP/1.1
> 18/09/04 14:45:10 DEBUG headers: >> Host:
> aa-sdk-test2.s3.us-east-1.amazonaws.com
> 18/09/04 14:45:10 DEBUG headers: >> Authorization: AWS
> AKIAIAQX5F5B2MLQPRGQ:Ap8rHsiPQPYUdcBb2Ojb/MA9q+I=
> 18/09/04 14:45:10 DEBUG headers: >> User-Agent: aws-sdk-java/1.7.4
> Mac_OS_X/10.13.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.144-b01/1.8.0_144
> 18/09/04 14:45:10 DEBUG headers: >> Range: bytes=0--1
> 18/09/04 14:45:10 DEBUG headers: >> Date: Tue, 04 Sep 2018 06:45:10 GMT
> 18/09/04 14:45:10 DEBUG headers: >> Content-Type:
> application/x-www-form-urlencoded; charset=utf-8
> 18/09/04 14:45:10 DEBUG headers: >> Connection: Keep-Alive
> 18/09/04 14:45:10 DEBUG wire:  << "HTTP/1.1 200 OK[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "x-amz-id-2:
>
> ooaOvIUsvupOOYOCVRY7y4TUanV9xJbcAqfd+w31xAkGRptm1blE5E5yMobmKsmRyGj9crhGCao=[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "x-amz-request-id:
> A1AD0240EBDD2234[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Date: Tue, 04 Sep 2018 06:45:11
> GMT[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Last-Modified: Tue, 04 Sep 2018 06:45:05
> GMT[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "ETag:
> "d41d8cd98f00b204e9800998ecf8427e"[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Accept-Ranges: bytes[\r][\n]"
> 18/09/04 14:45:10 DEBUG wire:  << "Content-Type:
> application/octet-stream[\r][\n]"
> 18/09/04 14:45:10 DEBUG

Re: error occur when I load data to s3

2018-09-03 Thread Kunal Kapoor
Ok. Let me have a look

On Tue, Sep 4, 2018, 8:22 AM aaron <949835...@qq.com> wrote:

> Hi kunalkapoor,
>It seems that error not fixed yet. Do you have any idea?
>
> thanks
> aaron
>
> aaron:2.2.1 aaron$ spark-shell --executor-memory 4g --driver-memory 2g
> Ivy Default Cache set to: /Users/aaron/.ivy2/cache
> The jars for the packages stored in: /Users/aaron/.ivy2/jars
> :: loading settings :: url =
>
> jar:file:/usr/local/Cellar/apache-spark/2.2.1/lib/apache-carbondata-1.5.0-SNAPSHOT-bin-spark2.2.1-hadoop2.7.2.jar!/org/apache/ivy/core/settings/ivysettings.xml
> com.amazonaws#aws-java-sdk added as a dependency
> org.apache.hadoop#hadoop-aws added as a dependency
> com.databricks#spark-avro_2.11 added as a dependency
> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
> confs: [default]
> found com.amazonaws#aws-java-sdk;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-support;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-core;1.10.75.1 in central
> found commons-logging#commons-logging;1.1.3 in central
> found org.apache.httpcomponents#httpclient;4.3.6 in local-m2-cache
> found org.apache.httpcomponents#httpcore;4.3.3 in local-m2-cache
> found commons-codec#commons-codec;1.6 in local-m2-cache
> found com.fasterxml.jackson.core#jackson-databind;2.5.3 in central
> found com.fasterxml.jackson.core#jackson-annotations;2.5.0 in
> central
> found com.fasterxml.jackson.core#jackson-core;2.5.3 in central
> found
> com.fasterxml.jackson.dataformat#jackson-dataformat-cbor;2.5.3 in
> central
> found joda-time#joda-time;2.8.1 in central
> found com.amazonaws#aws-java-sdk-simpledb;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-simpleworkflow;1.10.75.1 in
> central
> found com.amazonaws#aws-java-sdk-storagegateway;1.10.75.1 in
> central
> found com.amazonaws#aws-java-sdk-route53;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-s3;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-kms;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-importexport;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-sts;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-sqs;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-rds;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-redshift;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-elasticbeanstalk;1.10.75.1 in
> central
> found com.amazonaws#aws-java-sdk-glacier;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-sns;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-iam;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-datapipeline;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-elasticloadbalancing;1.10.75.1 in
> central
> found com.amazonaws#aws-java-sdk-emr;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-elasticache;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-elastictranscoder;1.10.75.1 in
> central
> found com.amazonaws#aws-java-sdk-ec2;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-dynamodb;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-cloudtrail;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-cloudwatch;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-logs;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-events;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-cognitoidentity;1.10.75.1 in
> central
> found com.amazonaws#aws-java-sdk-cognitosync;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-directconnect;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-cloudformation;1.10.75.1 in
> central
> found com.amazonaws#aws-java-sdk-cloudfront;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-kinesis;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-opsworks;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-ses;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-autoscaling;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-cloudsearch;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-cloudwatchmetrics;1.10.75.1 in
> central
> found com.amazonaws#aws-java-sdk-swf-libraries;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-codedeploy;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-codepipeline;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-config;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-lambda;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-ecs;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-ecr;1.10.75.1 in central
> found com.amazonaws#aws-java-sdk-c

Re: error occur when I load data to s3

2018-09-03 Thread Kunal Kapoor
Hi aaron,
Many issues like this have been identified in 1.4 version. Most of the
issues have been fixed in the master code and will be released in 1.5
version.
Remaing fixes are in progress.
Can you try the same scenario in 1.5(master branch).

Thanks
Kunal Kapoor

On Mon, Sep 3, 2018, 5:57 AM aaron <949835...@qq.com> wrote:

> *update the aws-java-sdk and hadoop-aws to below version, then
> authorization
> works.
> com.amazonaws:aws-java-sdk:1.10.75.1,org.apache.hadoop:hadoop-aws:2.7.3*
>
> *But we still can not load data, the exception is same.
> carbon.sql("LOAD DATA INPATH
> 'hdfs://localhost:9000/usr/carbon-s3/sample.csv' INTO TABLE
> test_s3_table")*
>
> 18/09/02 21:49:47 ERROR CarbonLoaderUtil: main Unable to unlock Table lock
> for tabledefault.test_s3_table during table status updation
> 18/09/02 21:49:47 ERROR CarbonLoadDataCommand: main
> java.lang.ArrayIndexOutOfBoundsException
> at java.lang.System.arraycopy(Native Method)
> at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:128)
> at
> org.apache.hadoop.fs.s3a.S3AOutputStream.write(S3AOutputStream.java:164)
> at
>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at
>
> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStream(S3CarbonFile.java:111)
> at
>
> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.getDataOutputStreamUsingAppend(S3CarbonFile.java:93)
> at
>
> org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStreamUsingAppend(FileFactory.java:276)
> at
> org.apache.carbondata.core.locks.S3FileLock.lock(S3FileLock.java:96)
> at
>
> org.apache.carbondata.core.locks.AbstractCarbonLock.lockWithRetries(AbstractCarbonLock.java:41)
> at
>
> org.apache.carbondata.core.locks.AbstractCarbonLock.lockWithRetries(AbstractCarbonLock.java:59)
> at
>
> org.apache.carbondata.processing.util.CarbonLoaderUtil.recordNewLoadMetadata(CarbonLoaderUtil.java:247)
> at
>
> org.apache.carbondata.processing.util.CarbonLoaderUtil.recordNewLoadMetadata(CarbonLoaderUtil.java:204)
> at
>
> org.apache.carbondata.processing.util.CarbonLoaderUtil.readAndUpdateLoadProgressInTableMeta(CarbonLoaderUtil.java:437)
> at
>
> org.apache.carbondata.processing.util.CarbonLoaderUtil.readAndUpdateLoadProgressInTableMeta(CarbonLoaderUtil.java:446)
> at
>
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:263)
> at
>
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:92)
> at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at
>
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
> at org.apache.spark.sql.Dataset.(Dataset.scala:183)
> at
>
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:107)
> at
>
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:96)
> at
> org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:154)
> at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:94)
> at
>
> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:34)
> at
>
> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:39)
> at
> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
> at
> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
> at
> $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
> at $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
> at $line25.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:49)
> at $line25.$read$$iw$$iw$$iw$$iw$$iw.(:51)
> at $line25.$read$$iw$$iw$$iw$$iw.(:53)
> at $line25.$read$$iw$$iw$$iw.(:55)
> at $line25.$read$$iw$$iw.(:57)
> at $line25.$read$$iw.(:59)
> at $line25.$read.(:61)
> at $line25.$read$.(:65)
> at $line25.$read$.()
> at $line25.$eval$.$print$lzycompute(:7)
> at $line25.$eval$.$print(:6)
> at $line25.$eval.$print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>

Re: [DISCUSSION] Remove BTree related code

2018-08-24 Thread Kunal Kapoor
+1 for removing unused code



Regards
Kunal Kapoor


On Fri, Aug 24, 2018, 2:09 PM Ravindra Pesala  wrote:

> +1
> We can remove unused code
>
> Regards,
> Ravindra
>
> On Fri, 24 Aug 2018 at 14:06, Kumar Vishal 
> wrote:
>
> > >
> > > +1
> >
> > Better to remove Btree code as now it is not getting used.
> > -Regards
> > Kumar Vishal
> >
>
>
> --
> Thanks & Regards,
> Ravi
>


Re: [VOTE] Apache CarbonData 1.4.1(RC2) release

2018-08-13 Thread Kunal Kapoor
+1

Regards
Kunal Kapoor

On Fri, Aug 10, 2018, 8:14 AM Ravindra Pesala  wrote:

> Hi
>
>
> I submit the Apache CarbonData 1.4.1 (RC2) for your vote.
>
>
> 1.Release Notes:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12343148
>
> Some key features and improvements in this release:
>
>1. Supported Local dictionary to improve IO and query performance.
>2. Improved and stabilized Bloom filter datamap.
>3. Supported left outer join MV datamap(Alpha feature)
>4. Supported driver min max caching for specified columns and
>segregate block and blocklet cache.
>5. Support Flat folder structure in carbon to maintain the same folder
>structure as Hive.
>6. Supported S3 read and write on carbondata files
>7. Support projection push down for struct data type.
>8. Improved complex datatypes compression and performance through
>adaptive encoding.
>9. Many Bug fixes and stabilized carbondata.
>
>
>  2. The tag to be voted upon : apache-carbondata-1.4.1.rc2(commit:
> a17db2439aa51f6db7da293215f9732ffb200bd9)
>
>
> https://github.com/apache/carbondata/releases/tag/apache-carbondata-1.4.1-rc2
>
>
> 3. The artifacts to be voted on are located here:
>
> https://dist.apache.org/repos/dist/dev/carbondata/1.4.1-rc2/
>
>
> 4. A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/orgapachecarbondata-1032
>
>
> 5. Release artifacts are signed with the following key:
>
> *https://people.apache.org/keys/committer/ravipesala.asc
> <
> https://link.getmailspring.com/link/1524823736.local-38e60b2f-d8f4-v1.2.1-7e744...@getmailspring.com/9?redirect=https%3A%2F%2Fpeople.apache.org%2Fkeys%2Fcommitter%2Fravipesala.asc&recipient=ZGV2QGNhcmJvbmRhdGEuYXBhY2hlLm9yZw%3D%3D
> >*
>
>
> Please vote on releasing this package as Apache CarbonData 1.4.1,  The vote
>
> will be open for the next 72 hours and passes if a majority of
>
> at least three +1 PMC votes are cast.
>
>
> [ ] +1 Release this package as Apache CarbonData 1.4.1
>
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
>
> [ ] -1 Do not release this package because...
>
>
> Regards,
> Ravindra.
>


Re: Operation not allowed: ALTER TABLE COMPACT(line 1, pos 0)

2018-07-01 Thread Kunal Kapoor
Hi chenxingyu,
I am unable to reproduce the issue. Can you please verify your jars. This
looks to be a parsing issue.

Thanks
Kunal Kapoor

On Wed, Jun 27, 2018 at 3:09 PM 陈星宇  wrote:

> hi,
> i refer to the official document to compact carbondata table by command
> 'ALTER TABLE tempdb.trade_item_par_3 COMPACT 'CUSTOM' WHERE SEGMENT.ID IN
> (11,12) ', but got error :
>
>
> Error: org.apache.spark.sql.AnalysisException: == Parse1 ==
>
>
> Operation not allowed: ALTER TABLE COMPACT(line 1, pos 0)
>
>
> == SQL ==
> ALTER TABLE tempdb.trade_item_par_3 COMPACT 'CUSTOM' WHERE SEGMENT.ID IN
> (11,12)
> ^^^
>
>
> == Parse2 ==
> [1.54] failure: ``;'' expected but `where' found
>
>
> ALTER TABLE tempdb.trade_item_par_3 COMPACT 'CUSTOM' WHERE SEGMENT.ID IN
> (11,12)
>  ^;;
> SQLState:  null
> ErrorCode: 0
>
>
>
> chenxingyu


Re: S3 support

2018-06-24 Thread Kunal Kapoor
Hi David,
Thanks for the suggestions.

1. Memory lock cannot support multiple drivers. Documentation will be
updated with this limitation.
2. I agree that in case of failure reverting the changes is necessary. Will
take care of this point.
3. You are right refresh using table name would not work. I think we can
introduce refresh using path for this scenario.

Thanks
Kunal Kapoor

On Fri, Jun 22, 2018 at 12:08 PM David CaiQiang 
wrote:

> Hi Kunal,
>  I have some questions.
>
> *Problem(Locking):*
>   Does the memory lock support that the multiple drivers concurrently load
> data to the same table? maybe it should note this limitation.
>
> *Problem(Write with append mode):*
>   1. atomicity
>   After the overwrite operation failed, maybe the old file is destroyed. It
> should be able to recover the old file.
>
> *Problem(Alter rename):*
> If the table folder is different with the table name, maybe "refresh
> table" command should be enhanced.
>
>
>
> -
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


  1   2   >