Re: [PROPOSAL] Update on the Jenkins CarbonData job

2017-03-18 Thread Vimal Das Kammath
+1

On Mar 17, 2017 8:18 PM, "Jean-Baptiste Onofré"  wrote:

> Hi guys,
>
> Tomorrow I plan to update our jobs on Apache Jenkins as the following:
>
> - carbondata-master-spark-1.5 building master branch with Spark 1.5 profile
> - carbondata-master-spark-1.6 building master branch with Spark 1.6 profile
> - carbondata-master-spark-2.1 building master branch with Spark 2.1 profile
> - carbondata-pr-spark-1.5 building PR with Spark 1.5 profile
> - carbondata-pr-spark-1.6 building PR with Spark 1.6 profile
> - carbondata-pr-spark-2.1 building PR with Spark 2.1 profile
>
> I will run some builds to identify eventual issues.
>
> No objection ?
>
> Thanks,
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] Graduation to a TLP (Top Level Project)

2017-03-04 Thread Vimal Das Kammath
+1
I am excited to see Carbondata project going for graduation to TLP.
Thanks JB for your hard work on driving this effort!

Regards,
Vimal

On Sat, Mar 4, 2017 at 3:54 AM, Uma gangumalla  wrote:

> +1
>
> Great to see see this project reached to graduation stage.
>
> Great work team!
>
> Regards,
> Uma
>
> On Mon, Feb 20, 2017 at 8:28 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi all,
> >
> > Regarding all work and progress we made so far in Apache CarbonData, I
> > think it's time to start the discussion about graduation as a new TLP
> (Top
> > Level Project) at the Apache Software Foundation.
> >
> > Graduation means we are a self-sustaining and self-governing community,
> > and ready to be a full participant in the Apache Software Foundation. Of
> > course, it doesn't imply that our community growth is complete or that a
> > particular level of technical maturity has been reached, rather that we
> are
> > on a solid trajectory in those areas. After graduation, we will still
> > periodically report to the ASF Board to ensure continued growth of a
> > healthy community.
> >
> > Graduation is an important milestone for the project and for the users
> > community.
> >
> > A way to think about graduation readiness is through the Apache Maturity
> > Model [1]. I think we satisfy most of the requirements [2].
> > There are some TODOs to address. I will tackle in the coming days
> (release
> > guide, security link, ...).
> >
> > Regarding the process, graduation consists of drafting a board
> resolution,
> > which needs to identify the full Project Management Committee, and
> getting
> > it approved by the community, the Incubator, and the Board. Within the
> > CarbonData community, most of these discussions and votes have to be on
> the
> > private@ mailing list.
> >
> > I would like to summarize here from points arguing in favor of
> graduation:
> > * Project's maturity self-assessment [2]
> > * 600 pull requests in incubation
> > * 5 releases (including RC) performed by two different release manager
> > * 65 contributors
> > * 4 new committers
> > * 713 Jira created, 593 resolved or closed
> >
> > Thoughts ? If you agree, I would like to share the maturity
> > self-assessment on the website.
> >
> > If you want to help me on some TODO tasks, please, ping me by e-mail,
> > Skype, hangout or whatever, to sync together.
> >
> > Thanks !
> > Regards
> > JB
> >
> > [1] http://community.apache.org/apache-way/apache-project-
> > maturity-model.html
> > [2] https://docs.google.com/document/d/12hifkDCfbyramBba1uRHYjwa
> > KEcxAyWMxS9iwJ1_etY/edit?usp=sharing
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


RE: [VOTE] Apache CarbonData 1.0.0-incubating release (RC2)

2017-01-20 Thread Vimal Das Kammath
+binding
Build is working fine with sufficient tests.

Regards
Vimal

On Jan 21, 2017 10:25 AM, "Kumar Vishal"  wrote:

> +1
> -Regards
> Kumar Vishal
>
>
> On Jan 21, 2017 08:03, "Jihong Ma"  wrote:
>
> > +1 (binding)
> >
> > Build is successful and clean test result.
> >
> > Regards.
> >
> > Jihong
> >
> > -Original Message-
> > From: Liang Chen [mailto:chenliang6...@gmail.com]
> > Sent: Friday, January 20, 2017 6:21 PM
> > To: dev@carbondata.incubator.apache.org
> > Subject: Re: [VOTE] Apache CarbonData 1.0.0-incubating release (RC2)
> >
> > +1(binding)
> >
> > I checked:
> > - name contains incubating
> > - disclaimer exists
> > - signatures and hash correct
> > - NOTICE good
> > - LICENSE is good
> > - Source files have ASF headers
> > - No unexpected binary files
> > - Can compile from source with "mvn clean -DskipTests -Pbuild-with-format
> > -Pspark-1.6 install"
> >
> > Regards
> > Liang
> >
> > 2017-01-21 9:38 GMT+08:00 Jacky Li :
> >
> > > Please find the build guide as following:
> > >
> > > Build guide (need install Apache Thrift 0.9.3, can use command: mvn
> clean
> > > -DskipTests -Pbuild-with-format -Pspark-1.6 install), please find the
> > > detail:
> > > https://github.com/apache/incubator-carbondata/tree/master/build
> > >
> > >
> > > > 在 2017年1月21日,上午9:36,Jacky Li  写道:
> > > >
> > > > Hi all,
> > > >
> > > > Please vote on releasing the following candidate as Apache
> > > CarbonData(incubating)
> > > > version 1.0.0.
> > > >
> > > > Release Notes:
> > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> > > ctId=12320220&version=12338020
> > > >
> > > > Staging Repository:
> > > > https://repository.apache.org/content/repositories/orgapache
> > > carbondata-1009
> > > >
> > > > Git Tag, apache-carbondata-1.0.0-incubating-rc2 at :
> > > > https://git-wip-us.apache.org/repos/asf?p=incubator-carbonda
> > > ta.git;a=commit;h=39efa332be094772daed05976b29241593da309f
> > > >
> > > > Please vote to approve this release:
> > > >
> > > > [ ] +1 Approve the release
> > > > [ ] -1 Don't approve the release (please provide specific comments)
> > > >
> > > > This vote will be open for at least 72 hours. If this vote passes (we
> > > need at least 3 binding votes, meaning three  votes from the PPMC), I
> > will
> > > forward to gene...@incubator.apache.org  > > he.org> for  the IPMC votes.
> > > >
> > > > Regards,
> > > > Jacky
> > >
> > >
> >
> >
> > --
> > Regards
> > Liang
> >
>


Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-27 Thread Vimal Das Kammath
+1
-vimal
On Nov 23, 2016 9:39 PM, "Venkata Gollamudi"  wrote:

> Hi All,
>
> CarbonData 0.2.0 has been a good work and stable release with lot of
> defects fixed and with number of performance improvements.
> https://issues.apache.org/jira/browse/CARBONDATA-320?jql=project%20%3D%
> 20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY%
> 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
>
> Next version has many major and new value added features are planned,
> taking CarbonData capability to next level.
> Like
> - IUD(Insert-Update-Delete) support,
> - complete rewrite of data load flow with out Kettle,
> - Spark 2.x support,
> - Standardize CarbonInputFormat and CarbonOutputFormat,
> - alluxio(tachyon) file system support,
> - Carbon thrift format optimization for fast query,
> - Data loading performance improvement and In memory off heap sorting,
> - Query performance improvement using off heap,
> - Support Vectorized batch reader.
>
> https://issues.apache.org/jira/browse/CARBONDATA-301?jql=project%20%3D%
> 20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY%
> 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
>
> I think it makes sense to change CarbonData Major version in next version
> to 1.0.0.
> Please comment and vote on this.
>
> Thanks,
> Ramana
>


Re: [Feature ]Design Document for Update/Delete support in CarbonData

2016-11-22 Thread Vimal Das Kammath
Hi Aniket,

The design looks sound and the documentation is great.
I have few suggestions.

1) Measure update vs dimension update : In case of dimension update. for
example user wants to change dept1 to dept2 for all users who are under
dept1. Can we just update the dictionary for faster performance?
2) Update Semantics (one matching record vs multiple matching record): I
could not understand this section. Wanted to confirm if we will support one
update statement updating multiple rows.

-Vimal

On Tue, Nov 22, 2016 at 2:30 PM, Liang Chen  wrote:

> Hi  Aniket
>
> Thanks you finished the good design documents. A couple of inputs from my
> side:
>
> 1.Please add the below mentioned info(Rowid definition etc.) to design
> documents also.
> 2.In page6 :"Schema change operation can run in parallel with Update or
> Delte operations, but not with another schema change operation" , can you
> explain this item ?
> 3.Please unify the description:  use "CarbonData" to replace "Carbon",
> unify the description for "destination table" and "target table".
> 4.The Update operation's delete delta is same with Delete operation's
> delete
> delta?
>
> BTW, it would be much better if you could provide google docs for review in
> the next time, it is really difficult to give comment based on pdf
> documents
> :)
>
> Regards
> Liang
>
> Aniket Adnaik wrote
> > Hi Sujith,
> >
> > Please see my comments inline.
> >
> > Best Regards,
> > Aniket
> >
> > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko <
>
> > sujithchacko.2010@
>
> > >
> > wrote:
> >
> >> Hi Aniket,
> >>
> >>   Its a well documented design,  just want to know few points like
> >>
> >> a.  Format of the RowID and its datatype
> >>
> >  AA>> Following format can be used to represent a unique rowed;
> >
> >  [
> > 
> > 
> > 
> > 
> > ]
> >  A simple way would be to use String data type and store it as a text
> > file.
> > However, more efficient way could be to use Bitsets/Bitmaps as further
> > optimization. Compressed Bitmaps such as Roaring bitmaps can be used for
> > better performance and efficient storage.
> >
> > b.  Impact of this feature in select query since every time query process
> > has to exclude each deleted records and include corresponding updated
> > record, any optimization is considered in tackling the query performance
> > issue since one of the major highlights of carbon is performance.
> > AA>> Some of the optimizations would be  to cache the deltas to avoid
> > recurrent I/O,
> > to store sorted rowids in delete delta for efficient lookup, and perform
> > regular compaction to minimize the impact on select query performance.
> > Additionally, we may have to explore ways to perform compaction
> > automatically, for example, if more than 25% of rows are read from
> deltas.
> > Please feel free to share if you have any ideas or suggestions.
> >
> > Thanks,
> > Sujith
> >
> > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" <
>
> > aniket.adnaik@
>
> > > wrote:
> >
> >> Hi All,
> >>
> >> Please find a design doc for Update/Delete support in CarbonData.
> >>
> >> https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view?
> >> usp=sharing
> >>
> >> Best Regards,
> >> Aniket
> >>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Feature-Design-
> Document-for-Update-Delete-support-in-CarbonData-tp3043p3093.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>


Re: Please vote and advise on building thrift files

2016-11-22 Thread Vimal Das Kammath
+1 for solution 1
We should also have a mechanism to ensure that the jar in repo is in sync
with the thrift files in git. Can we automate this in CI?
-Vimal

On Mon, Nov 21, 2016 at 8:02 AM, Venkata Gollamudi 
wrote:

> +1 for solution 1.
>
> Solution 2 would be good if generated code is very less. But Carbondata
> thrift generated code is around 10k, which is more. Also later has to
> support C++ interface also.
>
> On Sun, Nov 20, 2016, 10:53 PM Fu Chen  wrote:
>
> > +1 for Proposal 1
> > > 在 2016年11月18日,上午11:58,sujith chacko  写道:
> > >
> > > +1 for first approach.
> > >
> > > On Nov 17, 2016 9:58 AM, "金铸"  wrote:
> > >
> > >> +1 for proposal 1
> > >>
> > >>
> > >> 在 2016/11/17 12:13, 邢冰 写道:
> > >>
> > >>> +1 for proposal 1
> > >>>
> > >>> thx
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> 发自网易邮箱大师
> > >>> On 11/17/2016 12:09, Ravindra Pesala wrote:
> > >>> +1 for proposal 1
> > >>>
> > >>> On 17 November 2016 at 08:23, Xiaoqiao He 
> wrote:
> > >>>
> > >>> +1 for proposal 1.
> > 
> >  On Thu, Nov 17, 2016 at 10:31 AM, ZhuWilliam <
> allwefant...@gmail.com>
> >  wrote:
> > 
> >  +1 for proposal 1 .
> > >
> > > Auto generated code should not be added to project. Also most the
> of
> > > time
> > > ,people who dive into carbondata may not touch format code.
> > >
> > >
> > >
> > > --
> > > View this message in context: http://apache-carbondata-
> > > mailing-list-archive.1130556.n5.nabble.com/Please-vote-and-
> > > advise-on-building-thrift-files-tp2952p2957.html
> > > Sent from the Apache CarbonData Mailing List archive mailing list
> > > archive
> > > at Nabble.com.
> > >
> > >
> > >>>
> > >>> --
> > >>> Thanks & Regards,
> > >>> Ravi
> > >>>
> > >>
> > >>
> > >>
> > >>
> > >> 
> > >> ---
> > >> Confidentiality Notice: The information contained in this e-mail and
> any
> > >> accompanying attachment(s)
> > >> is intended only for the use of the intended recipient and may be
> > >> confidential and/or privileged of
> > >> Neusoft Corporation, its subsidiaries and/or its affiliates. If any
> > reader
> > >> of this communication is
> > >> not the intended recipient, unauthorized use, forwarding, printing,
> > >> storing, disclosure or copying
> > >> is strictly prohibited, and may be unlawful.If you have received this
> > >> communication in error,please
> > >> immediately notify the sender by return e-mail, and delete the
> original
> > >> message and all copies from
> > >> your system. Thank you.
> > >> 
> > >> ---
> > >>
> >
> >
>


Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-09 Thread Vimal Das Kammath
+1

-Vimal
On Nov 10, 2016 4:47 AM, "Liang Chen"  wrote:

> Hi all,
>
> I submit the CarbonData 0.2.0-incubating to your vote.
>
> Release Notes:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12320220&version=12337896
>
> Staging Repository:
> https://repository.apache.org/content/repositories/
> orgapachecarbondata-1006
>
> Git Tag:
> carbondata-0.2.0-incubating
>
> Please vote to approve this release:
> [ ] +1 Approve the release
> [ ] -1 Don't approve the release (please provide specific comments)
>
> This vote will be open for at least 72 hours. If this vote passes (we need
> at least 3 binding votes, meaning three votes from the PPMC), I will
> forward to gene...@incubator.apache.org for  the IPMC votes.
>
> Here is my vote : +1 (binding)
>
> Regards
> Liang
>


Re: please vote and comment: remove thrift solution

2016-10-24 Thread Vimal Das Kammath
Yes, I agree with Ravindra and Vishal.
Instead of the source code, we can have the jar published to repository,
We can have a maven profile that depends on carbon-format as jar from
repository (this can be the default, for new developers to build
successfully)
However, We also should have a profile that compiles the thrift files and
publishes the jar which can be used by CI and developer who wants to modify
the .thrift files.

Regards
Vimal

On Mon, Oct 24, 2016 at 3:13 PM, Kumar Vishal 
wrote:

> In case of any updation in thrift we need to update the java files. I think
> keeping the java file is not a good idea.
>
> -Regards
>
> Kumar Vishal
>
> On Oct 24, 2016 13:13, "caiqiang"  wrote:
>
> > Hi
> >
> > Currently, There are two typical Thrift issues:
> >
> > 1.Users want to directly build Apache CarbonData, don’t need to install
> > Thrift in advance. For example:
> >
> > Julian Hyde-3’s feedback in IPMC mailing list: I was not able to build
> > (not your fault - I did not have thrift
> > installed and didn't have the time & patience to install it).
> >
> > 2.Need to fix Apache Jenkins CI issues, as below:
> >
> > [ERROR] Failed to execute goal org.apache.thrift.tools:maven-
> thrift-plugin:0.1.11:compile
> > (generate-thrift-java) on project carbondata-format: thrift did not exit
> > cleanly. Review output for more information. -> [Help 1]
> >
> > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
> execute
> > goal org.apache.thrift.tools:maven-thrift-plugin:0.1.11:compile
> > (generate-thrift-java) on project carbondata-format: thrift did not exit
> > cleanly. Review output for more information.
> >
> > For solving the above mentioned Thrift issues, i would like to propose
> one
> > solution: Directly use java code ( thrift compiler compile carbondata
> > format files to java  code) to build, then users don't need to do any
> > thrift installation.
> >
> > please vote and comment :
> >
> > To continue use the current manual install method to build
> >
> > or
> >
> > To directly use java code which be generated by thrift compiler to build
>


Re: please vote and comment: remove thrift solution

2016-10-24 Thread Vimal Das Kammath
+1 for directly using java code to build.
At the same time, we should have the CI infrastructure to ensure that the
.thrift files and the generated source code are always in sync.

Regards
Vimal

On Mon, Oct 24, 2016 at 2:09 PM, Liang Chen  wrote:

> Hi
>
> I prefer to the new solution for fixing thrift issues:Directly use java
> code
> ( thrift compiler compile carbondata format files to java  code) to build,
> then users don't need to do any thrift installation.
>
> +1 for new solution.
>
> Regards
> Liang
>
> QiangCai wrote
> > Hi
> >
> > Currently, There are two typical Thrift issues:
> >
> > 1.Users want to directly build Apache CarbonData, don’t need to install
> > Thrift in advance. For example:
> >
> > Julian Hyde-3’s feedback in IPMC mailing list: I was not able to build
> > (not your fault - I did not have thrift
> > installed and didn't have the time & patience to install it).
> >
> > 2.Need to fix Apache Jenkins CI issues, as below:
> >
> > [ERROR] Failed to execute goal
> > org.apache.thrift.tools:maven-thrift-plugin:0.1.11:compile
> > (generate-thrift-java) on project carbondata-format: thrift did not exit
> > cleanly. Review output for more information. -> [Help 1]
> >
> > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
> execute
> > goal org.apache.thrift.tools:maven-thrift-plugin:0.1.11:compile
> > (generate-thrift-java) on project carbondata-format: thrift did not exit
> > cleanly. Review output for more information.
> >
> > For solving the above mentioned Thrift issues, i would like to propose
> one
> > solution: Directly use java code ( thrift compiler compile carbondata
> > format files to java  code) to build, then users don't need to do any
> > thrift installation.
> >
> > please vote and comment :
> >
> > To continue use the current manual install method to build
> >
> > or
> >
> > To directly use java code which be generated by thrift compiler to build
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/please-vote-and-
> comment-remove-thrift-solution-tp2253p2254.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>


[GitHub] incubator-carbondata pull request #231: [CARBONDATA-311]Log the data size of...

2016-10-17 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/231#discussion_r83780338
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/CarbonFactDataWriterImplForIntIndexAndAggBlock.java
 ---
@@ -250,6 +254,7 @@ public NodeHolder 
buildDataNodeHolder(IndexStorage[] keyStorageArray, byt
 }
 long blockletDataSize =
 holder.getKeyArray().length + holder.getDataArray().length + 
indexBlockSize;
+LOGGER.info("A new blocklet is added, its data size is: " + 
blockletDataSize + " Byte");
--- End diff --

Move this line after call to writeDataToFile, because if writeDataToFile 
fails, we should not log that a blocklet is added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Discussion(New feature) Support Complex Data Type: Map in Carbon Data

2016-10-17 Thread Vimal Das Kammath
The key in the map can be only primitive data types. At present, Carbon
Data supports following primitive data types Integer, String, Timestamp,
Double and Decimal.
If in future CarbonData adds supports more primitive data types, the same
can be used as key in the Map.

The reason for restricting the keys to primitive data types is that,if keys
were complex data types then lookup using key in the query will not be
possible in the SQL statement.

On Mon, Oct 17, 2016 at 7:43 AM, Liang Chen  wrote:

> Hi Vimal
>
> Thank you started the discussion.
> For keys of Map data only can be primitive, can you list these type which
> will be supported? (Int,String,Double..
>
> For discussing more conveniently, you can go ahead to use google docs.
> After the design document finalized , please archive and upload it to
> cwiki:https://cwiki.apache.org/confluence/display/
> CARBONDATA/CarbonData+Home
>
> Regards
> Liang
>
>
> Vimal Das Kammath wrote
> > Hi All,
> >
> > This discussion is regarding support for Map Data type in Carbon Data.
> >
> > Carbon Data supports complex and nested data types such as Arrays and
> > Struts. However, Carbon Data does not support other complex data types
> > such
> > as Maps and Union which are generally supported by popular opensource
> file
> > formats.
> >
> >
> > Supporting Map data type will require changes/additions to the DDL, Query
> > Syntax, Data Loading and Storage.
> >
> >
> > I have hosted the design on google docs for review and discussion.
> >
> > https://docs.google.com/document/d/1U6wPohvdDHk0B7bONnVHWa6PKG8R9
> q5-oKMqzMMQHYY/edit?usp=sharing
> >
> >
> > Below is the same inline.
> >
> >
> > 1.  DDL Changes
> >
> > Maps are key->value data types and where the value can be fetched by
> > providing the key. Hence we need to restrict keys to primitive data types
> > whereas values can be of any data type supported in Carbon(primitive and
> > complex).
> >
> > Map data types can be defined in the create table DDL as :-
> >
> > “MAP<primitive_data_type, data_type>”
> >
> > For Example:-
> >
> > create table example_table (id Int, name String, salary Int,
> > salary_breakup
> > map<String, Int>, city String)
> >
> >
> > 2.  Data Loading Changes
> >
> > Carbon should be able to support loading data into tables with Map type
> > columns from csv files. It should be possible to represent maps in a
> > single
> > row of csv. This will need carbon to support specifying the delimiters
> for
> > :-
> >
> > 1. Between two Key-Value pairs
> >
> > 2. Between each Key and Value in a pair
> >
> > As Carbon already supports Strut and Array Complex types, the data
> loading
> > process already provides support for defining delimiters for complex data
> > types. Carbon provides two Optional parameters for data loading
> >
> > 1. COMPLEX_DELIMITER_LEVEL_1: will define the delimiter between two
> > Key-Value pairs
> >
> > OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$')
> >
> > 2. COMPLEX_DELIMITER_LEVEL_2: will define the delimiter between each
> > Key and Value in a pair
> >
> > OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':')
> >
> > With these delimiter options, the below map can be represented in csv:-
> >
> > Fixed->100,000
> >
> > Bonus->30,000
> >
> > Stock->40,000
> >
> > As
> >
> > Fixed:100,000$Bonus:30,000$Stock:40,000 in the csv file.
> >
> >
> >
> > 3.  Query Capabilities
> >
> > A complex datatype like Map will require additional operators to be
> > supported in the query language to fully utilize the strength of the data
> > type.
> >
> > Maps are sequence of key-value pairs, hence should support looking up
> > value
> > for a given key. Users could use the ColumnName[“key”] syntax to lookup
> > values in a map column. For example: salary_breakup[“Fixed”] could be
> used
> > to fetch only the Fixed component in the salary breakup.
> >
> > In Addition, we also need to define how maps can be used in existing
> > constructs such as select, where(filter), group by etc..
> > 1. Select:- Map data type can be directly selected or only the value
> > for a given key can be selected as per the requirement. For
> > example:-“Select
> > name, salary, salary_breakup” will return the content of map long with
> > each
> > row.“Select name, salary, salary_breakup[“Fix

Discussion(New feature) Support Complex Data Type: Map in Carbon Data

2016-10-15 Thread Vimal Das Kammath
Hi All,

This discussion is regarding support for Map Data type in Carbon Data.

Carbon Data supports complex and nested data types such as Arrays and
Struts. However, Carbon Data does not support other complex data types such
as Maps and Union which are generally supported by popular opensource file
formats.


Supporting Map data type will require changes/additions to the DDL, Query
Syntax, Data Loading and Storage.


I have hosted the design on google docs for review and discussion.

https://docs.google.com/document/d/1U6wPohvdDHk0B7bONnVHWa6PKG8R9q5-oKMqzMMQHYY/edit?usp=sharing


Below is the same inline.


1.  DDL Changes

Maps are key->value data types and where the value can be fetched by
providing the key. Hence we need to restrict keys to primitive data types
whereas values can be of any data type supported in Carbon(primitive and
complex).

Map data types can be defined in the create table DDL as :-

“MAP”

For Example:-

create table example_table (id Int, name String, salary Int, salary_breakup
map, city String)


2.  Data Loading Changes

Carbon should be able to support loading data into tables with Map type
columns from csv files. It should be possible to represent maps in a single
row of csv. This will need carbon to support specifying the delimiters for
:-

1. Between two Key-Value pairs

2. Between each Key and Value in a pair

As Carbon already supports Strut and Array Complex types, the data loading
process already provides support for defining delimiters for complex data
types. Carbon provides two Optional parameters for data loading

1. COMPLEX_DELIMITER_LEVEL_1: will define the delimiter between two
Key-Value pairs

OPTIONS('COMPLEX_DELIMITER_LEVEL_1'='$')

2. COMPLEX_DELIMITER_LEVEL_2: will define the delimiter between each
Key and Value in a pair

OPTIONS('COMPLEX_DELIMITER_LEVEL_2'=':')

With these delimiter options, the below map can be represented in csv:-

Fixed->100,000

Bonus->30,000

Stock->40,000

As

Fixed:100,000$Bonus:30,000$Stock:40,000 in the csv file.



3.  Query Capabilities

A complex datatype like Map will require additional operators to be
supported in the query language to fully utilize the strength of the data
type.

Maps are sequence of key-value pairs, hence should support looking up value
for a given key. Users could use the ColumnName[“key”] syntax to lookup
values in a map column. For example: salary_breakup[“Fixed”] could be used
to fetch only the Fixed component in the salary breakup.

In Addition, we also need to define how maps can be used in existing
constructs such as select, where(filter), group by etc..
1. Select:- Map data type can be directly selected or only the value
for a given key can be selected as per the requirement. For example:-“Select
name, salary, salary_breakup” will return the content of map long with each
row.“Select name, salary, salary_breakup[“Fixed”]” will return only one
value from the map whose key is “Fixed”2. Filter:-Map data type cannot
be directly used in a where clause as where clause can operate only on
primitive data types. However the map lookup operator can be used in where
clauses. For example:-“Select name, salary where
salary_breakup[“Bonus”]>10,000”*Note: if the value is not of primitive
type, further assessor operators need to be used depending on the type of
value to arrive at a primitive type for the filter expression to be valid.*
3. Group By:- Just like with filters, maps cannot be directly used in a
group by clause, however the lookup operator can be used.

4. Functions:- A size() function can be provided for map types to
determine the number of key-value pairs in a map.
4.  Storage changes

As Carbon is a columnar data store, Map values will be stored using 3
physical columns

1. One Column for representing the Map Data type. Will store the number
of fields and start index, just the same way as it is done for Struts and
Arrays.

2. One Column for the Key

3. One Column for the value, if the value is of primitive data type,
else the value itself will be multiple physical columns depending on the
data type of the value.

Map

Column_1

Column_2

Column_3

Map_Salary_Breakup

Map_Salary_Breakup.key

Map_Salary_Breakup.value

3,1

Fixed

1,00,000

Bonus

30,000

Stock

40,000

2,4

Fixed

1,40,000

Bonus

30,000

3,6

Fixed

1,20,000

Bonus

20,000

Stock

30,000

Regards
Vimal


Re: [Discussion] Code generation in carbon result preparation

2016-10-14 Thread Vimal Das Kammath
Hi Vishal,

I think, we need both solution 1 & 2

Solution1 may need re-desiging several parts of Carbon's query process
starting from scanner, aggregator to result preparation. This can help
avoid the frequent cache invalidation.

In Solution2 code generation will not solve the frequent cache invalidation
problem. However, It will surely help to improve the performance by having
specialised code instead of executing generalised code. Especially as we
support several data types and our code is generalised for that. Code
generation will help to improve performance.

Regards
Vimal

On Thu, Oct 13, 2016 at 3:02 AM, Aniket Adnaik 
wrote:

> Hi Vishal,
>
> In general, it is good idea to have a cache efficient algorithm.
>
> For solution-1 :   how do you want to handle variable length columns and
> nulls? may be you will have to maintain variable length columns separately
> and use offsets ?
>
> For solution 2:  code generation may be more efficient solution. We should
> find out all other places in executor that can benefit from code generation
> apart from row formation. BTW, any specific code generation library you
> have mind?
>
> Best Regards,
> Aniket
>
> On Wed, Oct 12, 2016 at 10:02 AM, Kumar Vishal 
> wrote:
>
> > Hi Jacky,
> > Yes result preparation in exeutor side.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Wed, Oct 12, 2016 at 9:33 PM, Jacky Li  wrote:
> >
> > > Hi Vishal,
> > >
> > > Which part of the preparation are you considering? The column stitching
> > in
> > > the executor side?
> > >
> > > Regards,
> > > Jacky
> > >
> > > > 在 2016年10月12日,下午9:24,Kumar Vishal  写道:
> > > >
> > > > Hi All,
> > > > Currently we are preparing the final result row wise, as number of
> > > columns
> > > > present in project list(80 columns) is high mainly measure column or
> no
> > > > dictionary column there are lots of cpu cache invalidation is
> happening
> > > and
> > > > this is resulting to slower the query performance.
> > > >
> > > > *I can think of two solutions for this problem.*
> > > > *Solution 1*. Fill column data vertically, currently it is
> > > horizontally(It
> > > > may not solve all the problem)
> > > > *Solution 2*. Use code generation for result preparation.
> > > >
> > > > This is an initially idea.
> > > >
> > > > -Regards
> > > > Kumar Vishal
> > >
> > >
> > >
> > >
> >
>


Re: Disscusion shall CI support run carbondata based on multi version spark?

2016-10-14 Thread Vimal Das Kammath
Yes, I Agree. CI should be configured to build Carbon on different spark
versions.

On Fri, Oct 14, 2016 at 7:56 AM, Liang Chen  wrote:

>
> Yes, need to solve it , the CI should support different spark version.
>
> Regards
> Liang
>
>
> zhujin wrote
> > One issue:
> > I modified the spark.version in pom.xml,using spark1.6.2, then
> compliation
> > failed.
> >
> >
> > Root cause:
> > There was a "unused import statement" warinng in CarbonOptimizer class
> > before, we imported AggregationExpression like the following :
> > import org.apache.spark.sql.catalyst.expressions.aggregate._
> > import org.apache.spark.sql.catalyst.expressions._
> > But in spark1.6.2, AggregateExpressions is moved to subpackage
> "aggregate"
> > from "expressions"(that is for spark1.5.2).
> > So if we didn't known this change, we removed this import "import
> > org.apache.spark.sql.catalyst.expressions.aggregate._", it will cause
> > compliation failure when using spark1.6.2
> >
> >
> > Question:
> > So, maybe the CI should verify carbondata based on different version
> > spark, then it will be more helful to check the correctness of the
> > commits, shall we?
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Disscusion-
> shall-CI-support-run-carbondata-based-on-multi-
> version-spark-tp1836p1890.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>


Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Vimal Das Kammath
Global dictionary is the key feature behind Carbon's impressive query
performance as it enables late materialisation enabling carbon to perform
query execution using less memory and CPU resources. It also indirectly
helps carbon to perform better in concurrent query scenarios as any block
can be processed by any node without having to load a local dictionary.

I agree that the 2 pass approach is not the most optimal when it comes to
load performance. I see that we have lot of good alternatives suggested in
this discussion. We need to quantitatively evaluate each of the approaches
to come to a conclusion

1) Support for completely local dictionaries:- This will sure avoid the 2
pass issue. I think that we can have this as an option, but it need not be
the default because the benefit in query performance that we get from the
global dictionary far outweights the performance overhead during data load.
We can check current and future customer scenarios to validate whether
providing this option will benefit any of them. In that case we can
implement this as an optional flag during table creation.

2) Support for External Dictionary:- Current approach copies the externally
supplied dictionary into Carbon's global dictionary. Aniket's suggestion of
providing support for completely external dictionary using an Interface is
a good suggestion. I guess Ravindra's suggestion of using external k/v
stores or distributed maps can be implemented as per this interface. But we
need to test the performance of various distributed maps/key-value stores
and decide whether this is a viable option, because if this approach is
slower than the 2 pass approach, It won't make sense invest in this
approach. However, I support the idea of having an external dictionary
interface.

3) External tool to generate dictionary:- My opinion is that its just
de-coupling the first pass and moving it outside the data load process. But
from the user's perspective, they still need to run the tool first, to
generate the dictionary, before loading data. Our 2 pass approach just
automates this.

Regards
Vimal

On Fri, Oct 14, 2016 at 11:32 PM, Ravindra Pesala 
wrote:

> Hi Jihong,
>
> I agree, we can use external tool for first load, but for incremental load
> we should have solution to add global dictionary. So this solution should
> be enough to generate global dictionary even if user does not use external
> tool for first time. That solution could be distributed map or KV store.
>
> Regards,
> Ravi.
>
> On 14 October 2016 at 23:12, Jihong Ma  wrote:
>
> > Hi Liang,
> >
> > This tool is more or less like the first load, the first time after table
> > is created, any subsequent loads/incremental loads will proceed and is
> > capable of updating the global dictionary when it encounters new value,
> > this is easiest way of achieving 1 pass data loading process without too
> > much overhead.
> >
> > Since this tool is only triggered once per table, not considered too much
> > burden on the end users. Making global dictionary generation out of the
> way
> > of regular data loading is the key here.
> >
> > Jihong
> >
> > -Original Message-
> > From: Liang Chen [mailto:chenliang6...@gmail.com]
> > Sent: Thursday, October 13, 2016 5:39 PM
> > To: dev@carbondata.incubator.apache.org
> > Subject: RE: Discussion(New feature) regarding single pass data loading
> > solution.
> >
> > Hi jihong
> >
> > I am not sure that users can accept to use extra tool to do this work,
> > because provide tool or do scan at first time per table for most of
> global
> > dict are same cost from users perspective, and maintain the dict file
> also
> > be same cost, they always expecting that system can automatically and
> > internally generate dict file during loading data.
> >
> > Can we consider this:
> > first load: make scan to generate most of global dict file, then copy
> this
> > file to each load node for subsequent loading
> >
> > Regards
> > Liang
> >
> >
> > Jihong Ma wrote
> > >the question is what would be the default implementation? Load data
> > without dictionary?
> > >
> > > My thought is we can provide a tool to generate global dictionary using
> > > sample data set, so the initial global dictionaries is available before
> > > normal data loading. We shall be able to perform encoding based on
> that,
> > > we only need to handle occasionally adding entries while loading. For
> > > columns specified with global dictionary encoding, but dictionary is
> not
> > > placed before data loading, we error out and direct user to use the
> tool
> > > first.
> > >
> > > Make sense?
> > >
> > > Jihong
> > >
> > > -Original Message-
> > > From: Ravindra Pesala [mailto:
> >
> > > ravi.pesala@
> >
> > > ]
> > > Sent: Thursday, October 13, 2016 1:12 AM
> > > To: dev
> > > Subject: Re: Discussion(New feature) regarding single pass data loading
> > > solution.
> > >
> > > Hi Jihong/Aniket,
> > >
> > > In the current implementation of carbondata we are already han

Re: [VOTE] Apache CarbonData 0.1.1-incubating release

2016-10-05 Thread Vimal Das Kammath
I too missed the release vote. I apologize for the same.
On Oct 5, 2016 8:25 PM, "Henry Saputra"  wrote:

> I totally missed this. I will vet and reply to general@ list.
>
> I apologize about missing the release VOTE
>
> On Mon, Sep 26, 2016 at 2:11 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi all,
> >
> > I submit the CarbonData 0.1.1-incubating to your vote.
> >
> > Release Notes:
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> > ctId=12320220&version=12338021
> >
> > Staging Repository:
> > https://repository.apache.org/content/repositories/orgapache
> > carbondata-1003
> >
> > Git Tag:
> > carbondata-0.1.1-incubating
> >
> > Please vote to approve this release:
> >
> > [ ] +1 Approve the release
> > [ ] -1 Don't approve the release (please provide specific comments)
> >
> > This vote will be open for at least 72 hours. If this vote passes (we
> need
> > at least 3 binding votes, meaning three  votes from the PPMC), I will
> > forward to gene...@incubator.apache.org for  the IPMC votes. Thanks
> > Regards JB
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [Discussion] Support Date/Time format for Timestamp columns to be defined at column level

2016-09-24 Thread Vimal Das Kammath
Hi,

The date format at column level should be to support loading data into
carbon from csv files which have multiple columns with different date
format. Currently data loading will fail if the multiple date columns are
in different format.

For the query result, my opinion is that the all the date columns in the
result should be in a single format.

Regards
Vimal


On Sep 24, 2016 2:11 PM, "向志强"  wrote:

> Hi, all
>
> In recent days, I am trying to handle issues CARBONDATA-37. We are trying
> to support that Date format can be set at column level.
>
> There is a doubt that we should feedback the the same format for Date
> column or feedback a uniform format. Absolutely.
>
> For example.
>
> we create a table and define two cols which data type is Date. But the Date
> format is different.
>
> col1(Date)   col2(Date)
> 2016-09-24 2016-09-25 00:00:00
>
> when querying, for two formats below, which should be returned?
>
>  col1(Date)   col2(Date)
> 2016-09-24 2016-09-25 00:00:00
>
> or
>
>  col1(Date) col2(Date)
> 2016-09-24 00:00:00 2016-09-25 00:00:00
>
> if we set -MM-DD HH:MM:SS as default format.
>
>
> Best wishes!
>


[GitHub] incubator-carbondata pull request #123: [CARBONDATA-204] Clear queryStatisti...

2016-09-05 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/123#discussion_r77496731
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/DriverQueryStatisticsRecorder.java
 ---
@@ -78,106 +83,148 @@ public synchronized void 
recordStatisticsForDriver(QueryStatistic statistic, Str
*/
   public void logStatisticsAsTableDriver() {
 synchronized (lock) {
-  String tableInfo = collectDriverStatistics();
-  if (null != tableInfo) {
-LOGGER.statistic(tableInfo);
+  Iterator>> entries =
+  queryStatisticsMap.entrySet().iterator();
+  while (entries.hasNext()) {
+Map.Entry> entry = entries.next();
+String queryId = entry.getKey();
+// clear the unknown query statistics
+if(StringUtils.isEmpty(queryId)) {
+  queryStatisticsMap.remove(queryId);
--- End diff --

use the Iterator.remove() for better safety


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #123: [CARBONDATA-204] Clear queryStatisti...

2016-09-03 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/123#discussion_r77435578
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/DriverQueryStatisticsRecorder.java
 ---
@@ -78,106 +82,142 @@ public synchronized void 
recordStatisticsForDriver(QueryStatistic statistic, Str
*/
   public void logStatisticsAsTableDriver() {
 synchronized (lock) {
-  String tableInfo = collectDriverStatistics();
-  if (null != tableInfo) {
-LOGGER.statistic(tableInfo);
+  for (String key: queryStatisticsMap.keySet()) {
+// print 
sql_parse_t,load_meta_t,block_allocation_t,block_identification_t
+// or just print block_allocation_t,block_identification_t
+if (queryStatisticsMap.get(key).size() >= 2) {
+  String tableInfo = collectDriverStatistics(key);
+  if (null != tableInfo) {
+LOGGER.statistic(tableInfo);
+  }
+}
+// clear timeout query statistics
+if(StringUtils.isEmpty(key)) {
+  queryStatisticsMap.remove(key);
+} else {
+  long interval = System.nanoTime() - Long.parseLong(key);
+  if (interval > 
QueryStatisticsConstants.CLEAR_STATISTICS_TIMEOUT) {
+queryStatisticsMap.remove(key);
+  }
+}
   }
 }
   }
 
   /**
* Below method will parse queryStatisticsMap and put time into table
*/
-  public String collectDriverStatistics() {
-for (String key: queryStatisticsMap.keySet()) {
-  try {
-// TODO: get the finished query, and print Statistics
-if (queryStatisticsMap.get(key).size() > 3) {
-  String sql_parse_time = "";
-  String load_meta_time = "";
-  String block_allocation_time = "";
-  String block_identification_time = "";
-  Double driver_part_time_tmp = 0.0;
-  String splitChar = " ";
-  // get statistic time from the QueryStatistic
-  for (QueryStatistic statistic : queryStatisticsMap.get(key)) {
-switch (statistic.getMessage()) {
-  case QueryStatisticsConstants.SQL_PARSE:
-sql_parse_time += statistic.getTimeTaken() + splitChar;
-driver_part_time_tmp += statistic.getTimeTaken();
-break;
-  case QueryStatisticsConstants.LOAD_META:
-load_meta_time += statistic.getTimeTaken() + splitChar;
-driver_part_time_tmp += statistic.getTimeTaken();
-break;
-  case QueryStatisticsConstants.BLOCK_ALLOCATION:
-block_allocation_time += statistic.getTimeTaken() + 
splitChar;
-driver_part_time_tmp += statistic.getTimeTaken();
-break;
-  case QueryStatisticsConstants.BLOCK_IDENTIFICATION:
-block_identification_time += statistic.getTimeTaken() + 
splitChar;
-driver_part_time_tmp += statistic.getTimeTaken();
-break;
-  default:
-break;
-}
-  }
-  String driver_part_time = driver_part_time_tmp + splitChar;
-  // structure the query statistics info table
-  StringBuilder tableInfo = new StringBuilder();
-  int len1 = 8;
-  int len2 = 20;
-  int len3 = 21;
-  int len4 = 22;
-  String line = "+" + printLine("-", len1) + "+" + printLine("-", 
len2) + "+" +
-  printLine("-", len3) + "+" + printLine("-", len4) + "+";
-  String line2 = "|" + printLine(" ", len1) + "+" + printLine("-", 
len2) + "+" +
-  printLine(" ", len3) + "+" + printLine("-", len4) + "+";
-  // table header
-  tableInfo.append(line).append("\n");
-  tableInfo.append("|" + printLine(" ", (len1 - 
"Module".length())) + "Module" + "|" +
-  printLine(" ", (len2 - "Operation Step".length())) + 
"Operation Step" + "|" +
-  printLine(" ", (len3 + len4 + 1 - "Query Cost".length())) +
-  "Query Cost" + "|" + "\n");
-  // driver part
-  tableInfo.append(line).append("\n");
-  tableInfo.append("|" + 

[GitHub] incubator-carbondata pull request #123: [CARBONDATA-204] Clear queryStatisti...

2016-09-03 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/123#discussion_r77435539
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/querystatistics/DriverQueryStatisticsRecorder.java
 ---
@@ -78,106 +82,142 @@ public synchronized void 
recordStatisticsForDriver(QueryStatistic statistic, Str
*/
   public void logStatisticsAsTableDriver() {
 synchronized (lock) {
-  String tableInfo = collectDriverStatistics();
-  if (null != tableInfo) {
-LOGGER.statistic(tableInfo);
+  for (String key: queryStatisticsMap.keySet()) {
+// print 
sql_parse_t,load_meta_t,block_allocation_t,block_identification_t
+// or just print block_allocation_t,block_identification_t
+if (queryStatisticsMap.get(key).size() >= 2) {
--- End diff --

The call can return null, because the keyset obtained reflects the state of 
map when the .keySet() method was called, later changes in the map will not be 
reflected. So, if the element is removed in the meantime, get() can return null.

Solution: use an iterator over entrySet()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #110: [CARBONDATA-193]Fix the bug that neg...

2016-08-30 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/110#discussion_r76929816
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/ValueCompressionUtil.java ---
@@ -78,26 +78,26 @@ private ValueCompressionUtil() {
   private static DataType getDataType(double value, int decimal, byte 
dataTypeSelected) {
 DataType dataType = DataType.DATA_DOUBLE;
 if (decimal == 0) {
-  if (value < Byte.MAX_VALUE) {
+  if (value < Byte.MAX_VALUE && value > Byte.MIN_VALUE) {
--- End diff --

should we change to value <= Byte.MAX_VALUE && value >= Byte.MIN_VALUE ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...

2016-08-27 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/92#discussion_r76516430
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -410,6 +410,14 @@ public void writeLoadDetailsIntoFile(String 
dataLoadLocation,
   for (LoadMetadataDetails loadMetadata : 
listOfLoadFolderDetailsArray) {
 
 if (loadId.equalsIgnoreCase(loadMetadata.getLoadName())) {
+  // if the segment is compacted then no need to delete that.
+  if (CarbonCommonConstants.SEGMENT_COMPACTED
+  .equalsIgnoreCase(loadMetadata.getLoadStatus())) {
+LOG.error("Cannot delete the load which is compacted.");
--- End diff --

In the log, mention the load id for which you are giving this error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #95: [CARBONDATA-152]Double min max differ...

2016-08-27 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/95#discussion_r76512969
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastorage/store/compression/type/UnCompressNonDecimalMaxMinByte.java
 ---
@@ -93,7 +95,9 @@
   if (value[i] == 0) {
 vals[i] = maxValue;
   } else {
-vals[i] = (maxValue - value[i]) / Math.pow(10, decimalVal);
+BigDecimal diff = new BigDecimal(Double.toString(value[i] / 
Math.pow(10, decimalVal)));
--- End diff --

use BigDecimal.valueOf(double) instead of converting to string for 
performance and dataloss concerns.refer 
http://www.javaworld.com/article/2073176/caution--double-to-bigdecimal-in-java.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #95: [CARBONDATA-152]Double min max differ...

2016-08-27 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/95#discussion_r76512976
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastorage/store/compression/type/UnCompressNonDecimalMaxMinFloat.java
 ---
@@ -98,7 +99,9 @@
   if (value[i] == 0) {
 vals[i] = maxValue;
   } else {
-vals[i] = (maxValue - value[i]) / Math.pow(10, decimal);
+BigDecimal diff = new BigDecimal(Double.toString(value[i] / 
Math.pow(10, decimal)));
--- End diff --

use BigDecimal.valueOf(double) instead of converting to string for 
performance and dataloss concerns.refer 
http://www.javaworld.com/article/2073176/caution--double-to-bigdecimal-in-java.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #95: [CARBONDATA-152]Double min max differ...

2016-08-27 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/95#discussion_r76512979
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastorage/store/compression/type/UnCompressNonDecimalMaxMinInt.java
 ---
@@ -93,7 +94,9 @@
   if (value[i] == 0) {
 vals[i] = maxValue;
   } else {
-vals[i] = (maxValue - value[i]) / Math.pow(10, decimal);
+BigDecimal diff = new BigDecimal(Double.toString(value[i] / 
Math.pow(10, decimal)));
--- End diff --

use BigDecimal.valueOf(double) instead of converting to string for 
performance and dataloss concerns.refer 
http://www.javaworld.com/article/2073176/caution--double-to-bigdecimal-in-java.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #95: [CARBONDATA-152]Double min max differ...

2016-08-27 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/95#discussion_r76512974
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastorage/store/compression/type/UnCompressNonDecimalMaxMinDefault.java
 ---
@@ -95,7 +96,9 @@
   if (value[i] == 0) {
 vals[i] = maxVal;
   } else {
-vals[i] = (maxVal - value[i]) / Math.pow(10, decimal);
+BigDecimal diff = new BigDecimal(Double.toString(value[i] / 
Math.pow(10, decimal)));
--- End diff --

use BigDecimal.valueOf(double) instead of converting to string for 
performance and dataloss concerns.refer 
http://www.javaworld.com/article/2073176/caution--double-to-bigdecimal-in-java.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #97: [CARBONDATA-180]give proper error mes...

2016-08-27 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/97#discussion_r76512936
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -689,6 +689,10 @@ object GlobalDictionaryUtil extends Logging {
   generatePredefinedColDictionary(colDictFilePath, table,
 dimensions, carbonLoadModel, sqlContext, hdfsLocation, 
dictfolderPath)
 }
+if (headers.length > df.columns.length) {
+  logError("Either delimiter or fileheader provided is not 
correct")
--- End diff --

The log can be more informative like
" The number of columns in the file header do not match the number of 
columns in the data file; Either delimiter or fileheader provided is not 
correct"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #100: Handle all dictionary exception more...

2016-08-27 Thread Vimal-Das
Github user Vimal-Das commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/100#discussion_r76512895
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -629,22 +631,32 @@ object GlobalDictionaryUtil extends Logging {
 // filepath regex, look like "/path/*.dictionary"
 if (filePath.getName.startsWith("*")) {
   val dictExt = filePath.getName.substring(1)
-  val listFiles = filePath.getParentFile.listFiles()
-  if (listFiles.exists(file =>
-file.getName.endsWith(dictExt) && file.getSize > 0)) {
-true
+  if (filePath.getParentFile.exists()) {
+val listFiles = filePath.getParentFile.listFiles()
+if (listFiles.exists(file =>
+  file.getName.endsWith(dictExt) && file.getSize > 0)) {
+  true
+} else {
+  logWarning("[ALL_DICTIONARY] No dictionary files found or empty 
dictionary files! " +
+"Won't generate new dictionary.")
+  false
+}
   } else {
-logInfo("No dictionary files found or empty dictionary files! " +
-  "Won't generate new dictionary.")
-false
+throw new FileNotFoundException(
+  "[ALL_DICTIONARY] The given dictionary file path not found!")
   }
 } else {
-  if (filePath.exists() && filePath.getSize > 0) {
-true
+  if (filePath.exists()) {
+if (filePath.getSize > 0) {
+  true
+} else {
+  logWarning("[ALL_DICTIONARY] No dictionary files found or empty 
dictionary files! " +
+"Won't generate new dictionary.")
+  false
+}
   } else {
-logInfo("No dictionary files found or empty dictionary files! " +
-  "Won't generate new dictionary.")
-false
+throw new FileNotFoundException(
+  "[ALL_DICTIONARY] The given dictionary file path not found!")
--- End diff --

Correct english grammar in the log messages.
The given dictionary file path not found! =>The given dictionary file path 
**is** not found!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache CarbonData 0.1.0-incubating release

2016-08-20 Thread Vimal Das Kammath
+1 (binding)
On Aug 20, 2016 12:27 AM, "Jean-Baptiste Onofré"  wrote:

> Hi all,
>
> I submit the first CarbonData release to your vote.
>
> Release Notes:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> ctId=12320220&version=12337895
>
> Staging Repository:
> https://repository.apache.org/content/repositories/orgapache
> carbondata-1000/
>
> Git Tag:
> carbondata-0.1.0-incubating
>
> Please vote to approve this release:
>
> [ ] +1 Approve the release
> [ ] -1 Don't approve the release (please provide specific comments)
>
> This vote will be open for at least 72 hours.
>
> If this vote passes (we need at least 3 binding votes, meaning three votes
> from the PPMC), I will forward to gene...@incubator.apache.org for the
> IPMC votes.
>
> Thanks
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Open discussion and Vote: What kind of JIRA issue events need send mail to dev@carbondata.incubator.apache.org

2016-08-19 Thread Vimal Das Kammath
+1 for option 2
On Aug 19, 2016 6:36 AM, "chenliang613"  wrote:

> Hi
>
> I agree Henry's proposal.
>
> Issue Created, send mails to dev@mailing list
> Issue Created and all other JIRA events send mails to new mailing list at
> issues@mailing list.
>
>
> Regards
> Liang
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Open-discussion-
> and-Vote-What-kind-of-JIRA-issue-events-need-send-mail-
> to-dev-carbondata-incubator-ag-tp321p332.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>


Re: [PROPOSAL] How to merge a pull request

2016-08-10 Thread Vimal Das Kammath
+1
Great idea, Having it as a tool as Henry suggested would definitely make
life easier.

On Wed, Aug 10, 2016 at 12:29 PM, Jihong Ma  wrote:

> +1
>
> Great idea and I am sure it will make our life  a lot easier as committer!!
>
> Jihong
>
> Sent from HUAWEI AnyOffice
> From: Jacky Li
> To: dev@carbondata.incubator.apache.org;
> Subject: Re: [PROPOSAL] How to merge a pull request
>
> Time: 2016-08-09 20:56:25
> definitely +1
>
>
> > 在 2016年8月9日,下午1:33,Jean-Baptiste Onofré  写道:
> >
> > Yes good idea.
> >
> > I'm thinking about a github PR template too as we use in Beam.
> >
> > Regards
> > JB
> >
> > On 08/09/2016 07:31 AM, Henry Saputra wrote:
> >> This is great stuff, thanks for taking stab at it, JB.
> >>
> >> I would reccommend we add tool in the source code to help committers
> merge
> >> PRs.
> >>
> >> Some projects like Apache Spark [1] and Apache Flink have simple script
> to
> >> help automate the process.
> >> We could adopt the script to do similar thing for CarbonData.
> >>
> >> - Henry
> >>
> >> [1] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
> >>
> >> On Fri, Aug 5, 2016 at 5:27 AM, Jean-Baptiste Onofré 
> >> wrote:
> >>
> >>> Hi guys,
> >>>
> >>> I discussed with Ravi how to cleanly merge a pull request, eventually
> >>> applying changes, keeping the original commit author, etc.
> >>>
> >>> I proposed a procedure:
> >>>
> >>> https://github.com/apache/incubator-carbondata/pull/63#issue
> >>> comment-237817370
> >>>
> >>> For convenience, let me paste the proposal here:
> >>>
> >>> Prerequisite
> >>>
> >>> Assuming, you cloned the Apache git repo:
> >>>
> >>> git clone https://git-wip-us.apache.org/repos/asf/incubator-carbondata
> >>> I advise to rename origin remote as apache:
> >>>
> >>> git remote rename origin apache
> >>> Now, let's add the github remote:
> >>>
> >>> git remote add github https://github.com/apache/incubator-carbondata
> >>> For convenience, we add a new fetch reference for the pull requests:
> >>>
> >>> git config --local --add remote.github.fetch
> '+refs/pull/*/head:refs/remote
> >>> s/github/pr/*'
> >>> Then, we can fetch all, including the pull requests:
> >>>
> >>> git fetch --all
> >>> Pull Request Branch
> >>>
> >>> Now, we are ready to checkout a pull request in a specific branch:
> >>>
> >>> git checkout -b pr-63 github/pr/63
> >>> You are now on the pull request (#63) branch: you can review and test
> the
> >>> pull request (building with Maven, verify, ...).
> >>>
> >>> Then, you can amend the commit, squash several commits in one, rebase,
> >>> etc. Basically, it's where you are preparing the merge.
> >>>
> >>> Merging the Pull Request
> >>>
> >>> Once the pull request branch is ready, you can merge on master:
> >>>
> >>> git checkout master
> >>> git merge --no-ff -m "[CARBONDATA-140] This closes #63" pr-63
> >>> git push
> >>> Once the merge has been done, you can delete the pull request branch:
> >>>
> >>> git branch -D pr-63
> >>>
> >>>
> >>> Thoughts ?
> >>>
> >>> Regards
> >>> JB
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>
>
>
>


Re: Discussing to add carbondata-tools module

2016-08-05 Thread Vimal Das Kammath
+1
On Aug 5, 2016 2:45 PM, "Jean-Baptiste Onofré"  wrote:

> I guess it's where we can put the checkstyle, ... (what we have in the dev
> folder right now), correct ?
>
> Regards
> JB
>
> On 08/05/2016 08:31 AM, QiangCai wrote:
>
>> Hi all,
>>
>>   To improve the CarbonData system's usability and maintainability, I
>> suggest to add carbondata-tools module.
>>   I think this module should provide some command tools as following.
>>
>>   1. import
>>   import a data file/folder to any existing table
>>
>>   2. export
>>   export the given columns to a file
>>
>>   3. schema
>>   show the detail information of the specified table schema file
>>
>>   4. metadata
>>   show tablestatus metadata
>>   show the history track for dataloading and compaction
>>
>>   5. footer
>>   show blocklet metadata list
>>   show start/end key, min/max value, row number, total size  for the
>> specified blocklet
>>
>>   6. blocklet
>>   show blocket list
>>   show blocket data, RLE map, Inverted index for the given columns
>>
>>   7. index
>>   show BTree node list
>>   show node information:start/end key, min/max value)
>>
>>   8. dictionary
>>   show  key-value list of specified gloabl/local dictionary file
>>   show sort index
>>   show dictionary metadata
>>
>> Thank you and look forward to having your opinion on this carbon-tools
>> module.
>>
>> David Cai
>>
>>
>>
>> --
>> View this message in context: http://apache-carbondata-maili
>> ng-list-archive.1130556.n5.nabble.com/Discussing-to-add-
>> carbondata-tools-module-tp4.html
>> Sent from the Apache CarbonData Mailing List archive mailing list archive
>> at Nabble.com.
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [jira] [Created] (CARBONDATA-42) Missing Code on Github for Compilation

2016-07-05 Thread Vimal Das Kammath
Probably org.carbondata.format is not getting build in your environment.
org.carbondata.format requires Apache Thrift compiler to be built.

Can you share the complete console messages, or try to build directly using
maven and share the detailed error message so that we can assist further.

On Wed, Jul 6, 2016 at 6:29 AM, Ahmed Abdelhamid (JIRA) 
wrote:

> Ahmed Abdelhamid created CARBONDATA-42:
> --
>
>  Summary: Missing Code on Github for Compilation
>  Key: CARBONDATA-42
>  URL: https://issues.apache.org/jira/browse/CARBONDATA-42
>  Project: CarbonData
>   Issue Type: Bug
>  Environment: Error:(35, 24) java: package org.carbondata.format
> does not exist
> Reporter: Ahmed Abdelhamid
> Priority: Blocker
>
>
> The package;
>
> Error:(35, 24) java: package org.carbondata.format does not exist
>
> Is not available in the source code on Github and it is causing 100 error
> when trying to make the project in Intellij.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


Re: Urgent Error: package org.carbondata.format does not exist

2016-07-05 Thread Vimal Das Kammath
Probably org.carbondata.format is not getting build in your environment.
org.carbondata.format requires Apache Thrift compiler to be built.

Can you share the maven console messages so that we can assist further.

On Wed, Jul 6, 2016 at 7:55 AM, Ahmed Abdelhamid  wrote:

> Hi All,
>
> The subject error happens when I clone the Github repository and try to
> make carbon-core.
>
> I believe there is a possibility that there are some missing code from the
> repository.
>
> Please advise,
>
> Samy
>


[jira] [Created] (CARBONDATA-39) ColumnGroup Feature Should Support Non-Dictionary Encoded Columns as well.

2016-07-05 Thread Vimal Das Kammath (JIRA)
Vimal Das Kammath created CARBONDATA-39:
---

 Summary: ColumnGroup Feature Should Support Non-Dictionary Encoded 
Columns as well.
 Key: CARBONDATA-39
 URL: https://issues.apache.org/jira/browse/CARBONDATA-39
 Project: CarbonData
  Issue Type: Improvement
Reporter: Vimal Das Kammath


ColumnGroup feature supports only Dictionary Encoded Columns.

To be really effective, column groups should support non-dictionary encoded 
columns as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-38) Cleanup carbon.properties

2016-07-05 Thread Vimal Das Kammath (JIRA)
Vimal Das Kammath created CARBONDATA-38:
---

 Summary: Cleanup carbon.properties
 Key: CARBONDATA-38
 URL: https://issues.apache.org/jira/browse/CARBONDATA-38
 Project: CarbonData
  Issue Type: Improvement
Reporter: Vimal Das Kammath


The carbon.properties.template file has several stale configurations which are 
not used from the code.

Please clean up the same and retain only valid configuration properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-07-05 Thread Vimal Das Kammath (JIRA)
Vimal Das Kammath created CARBONDATA-37:
---

 Summary: Support Date/Time format for Timestamp columns to be 
defined at column level
 Key: CARBONDATA-37
 URL: https://issues.apache.org/jira/browse/CARBONDATA-37
 Project: CarbonData
  Issue Type: Improvement
Reporter: Vimal Das Kammath


Carbon support defining the Date/Time format. But the configuration for the 
same is present in carbon.properties and hence is global for all tables.

This global configuration for timestamp format cannot support scenarios where 
different tables or different Timestamp columns in the same table.

Suggest to provide option in the create table DDL itself to define the format 
for each Timestamp column. Also provide defaults so that users can create table 
with Timestamp columns without having to always define the Date/Time format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)