Re: [ANNOUNCE] Kunal Kapoor as new PMC for Apache CarbonData

2020-03-29 Thread Bhavya Aggarwal
Congratulations Kunal.

Thanks and regards
Bhavya

On Sun, Mar 29, 2020 at 12:43 PM Akash r  wrote:

> Congratulations Kunal.
>
>
> Regards,
> Akash R Nilugal
>
> On Sun, Mar 29, 2020, 12:37 PM Liang Chen  wrote:
>
> > Hi
> >
> >
> > We are pleased to announce that Kunal Kapoor as new PMC for Apache
> > CarbonData.
> >
> >
> > Congrats to Kunal Kapoor!
> >
> >
> > Apache CarbonData PMC
> >
>


-- 
*Bhavya Aggarwal*
CTO & Partner
Knoldus Inc. <http://www.knoldus.com/>
+91-9910483067
Canada - USA - India - Singapore
<https://in.linkedin.com/company/knoldus> <https://twitter.com/Knolspeak>
<https://www.facebook.com/KnoldusSoftware/> <https://blog.knoldus.com/>

-- 
Your feedback matters - At Knoldus we aim to be very professional in our 
quality of work, commitment to results, and proactive communication. If you 
feel otherwise please share your feedback 
<https://forms.gle/Ax1Te1DDpirAQuQ8A> and we would work on it. 


Re: [DISCUSS] Move to gitbox as per ASF infra team mail

2019-01-06 Thread Bhavya Aggarwal
+1

On Mon, Jan 7, 2019 at 8:08 AM David CaiQiang  wrote:

> +1
>
>
>
> -
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


-- 
*Bhavya Aggarwal*
CTO & Partner
Knoldus Inc. <http://www.knoldus.com/>
+91-9910483067
Canada - USA - India - Singapore
<https://in.linkedin.com/company/knoldus> <https://twitter.com/Knolspeak>
<https://www.facebook.com/KnoldusSoftware/> <https://blog.knoldus.com/>


Re: [Discussion] Propose to upgrade the version of integration/presto from 0.187 to 0.206

2018-07-24 Thread Bhavya Aggarwal
Hi Dev,

Yes, we should definitely go for the 0.206 upgrade for Presto as we are now
using the dictionary_aggregation feature for optimization. The other bug
fixes are also important for carbondata integration.
However, they have changed the connector interface as well, so we might
need to change our interface accordingly.

Thanks and regards
Bhavya

On Tue, Jul 24, 2018 at 2:11 PM, Liang Chen  wrote:

> Hi Dev
>
> The presto community already released 0.206 last week (refer the detail at
> https://prestodb.io/docs/current/release/release-0.206.html),  this
> release
> fixed many issues, so propose Apache CarbonData community to upgrade to the
> latest presto version for carbondata integration.
>
> please provide your opinion.
>
> Regards
> Liang
>



-- 
*Bhavya Aggarwal*
CTO & Partner
Knoldus Inc. <http://www.knoldus.com/>
+91-9910483067
Canada - USA - India - Singapore
<https://in.linkedin.com/company/knoldus> <https://twitter.com/Knolspeak>
<https://www.facebook.com/KnoldusSoftware/> <https://blog.knoldus.com/>


Re: query carbondata by presto got error : tableCacheModel.carbonTable should not be null

2018-06-22 Thread Bhavya Aggarwal
Hi,
Are you running it on local or cluster, it should not come but if you will
send the stack trace we can resolve it.

Regards
Bhavya

On Fri, Jun 22, 2018 at 12:51 PM, 陈星宇  wrote:

> hi ,
> i query carbondata by presto, but got error : tableCacheModel.carbonTable
> should not be null
> any idea for this issue?
>
>
> chenxingyu




-- 
*Bhavya Aggarwal*
Sr. Director
Knoldus Inc. <http://www.knoldus.com/>
+91-9910483067
Canada - USA - India - Singapore
<https://in.linkedin.com/company/knoldus> <https://twitter.com/Knolspeak>
<https://www.facebook.com/KnoldusSoftware/> <https://blog.knoldus.com/>


Re: [Discussion] Carbon Local Dictionary Support

2018-06-07 Thread Bhavya Aggarwal
Hi Vishal,

Thanks for sharing the design and I have one question related to deciding
on whether to generate the dictionary or not. If in first few loads we have
the cardinality below the threshold then we will create a local dictionary,
but if in subsequent loads the threshold value is breached than what will
happen to the data of previous loads?

Regards
Bhavya

On Thu, Jun 7, 2018 at 5:28 PM, xuchuanyin  wrote:

> About query filtering
>
> 1. “during filter, actual filter values will be generated using column
> local
> dictionary values...then filter will be applied on the dictionary encode
> data”
> ---
> If the filter is not 'equal' but 'like','greater than', can it also run on
> encode data.
>
> 2. "As dictionary data will be always of 4 bytes "
> ---
> Why they are 4 bytes?
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>



-- 
*Bhavya Aggarwal*
Sr. Director
Knoldus Inc. <http://www.knoldus.com/>
+91-9910483067
Canada - USA - India - Singapore
<https://in.linkedin.com/company/knoldus> <https://twitter.com/Knolspeak>
<https://www.facebook.com/KnoldusSoftware/> <https://blog.knoldus.com/>


Re: [ANNOUNCE] Zhichao Zhang as new Apache CarbonData committer

2018-05-02 Thread Bhavya Aggarwal
Congrats Zhichao!


Regards
Bhavya

On Wed, May 2, 2018 at 8:37 AM, Lionel CL  wrote:

> Congrats Zhichao!
>
> Best Regards,
> CaoLu
>
> 发件人: Liang Chen >
> 答复: "u...@carbondata.apache.org" <
> u...@carbondata.apache.org>
> 日期: 2018年5月2日 星期三 上午10:59
> 至: "dev@carbondata.apache.org" <
> dev@carbondata.apache.org>, "
> u...@carbondata.apache.org" <
> u...@carbondata.apache.org>
> 主题: [ANNOUNCE] Zhichao Zhang as new Apache CarbonData committer
>
> Hi all
>
> We are pleased to announce that the PMC has invited Zhichao Zhang as new
> Apache CarbonData committer, and the invite has been accepted!
>
> Congrats to Zhichao Zhang and welcome aboard.
>
> Regards
> Apache CarbonData PMC
>
>


Re: [ANNOUNCE] Kumar Vishal as new PMC for Apache CarbonData

2018-01-10 Thread Bhavya Aggarwal
Congratulations Vishal..

Regards
Bhavya

On 10-Jan-2018 2:47 PM, "xm_zzc" <441586...@qq.com> wrote:

>  Congratulations Vishal !
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>


Re: [ANNOUNCE] Kunal Kapoor as new Apache CarbonData committer

2018-01-08 Thread Bhavya Aggarwal
Congrats Kunal !!

Thanks
Bhavya

On Tue, Jan 9, 2018 at 12:49 PM, Simarpreet Kaur Monga <
simarpr...@knoldus.in> wrote:

> Congratulations Kunal !!
>
> On Tue, Jan 9, 2018 at 10:45 AM, Shivangi Gupta  >
> wrote:
>
> > Congratulations Kunal 
> >
> > On Mon, Jan 8, 2018 at 8:31 PM, Ravindra Pesala 
> > wrote:
> >
> > > Congrats Kunal
> > >
> > > Regards,
> > > Ravindra
> > >
> > > On 8 January 2018 at 20:29, xm_zzc <441586...@qq.com> wrote:
> > >
> > > > Congratulations Kunal  !!
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556
> .
> > > > n5.nabble.com/
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Ravi
> > >
> >
>
>
>
> --
> Regards
> Simarpreet Kaur
> Software Consultant
> Knoldus Software LLP
>


Re: carbondata with presto unsupported Complex Types

2017-12-11 Thread Bhavya Aggarwal
Hi Dylan,

Yes we have plans to support the complex type in Presto Integration in near
future, will update you once it is done.

Regards
Bhavya

On Mon, Dec 11, 2017 at 1:36 PM, dylan  wrote:

> hi  anubhavtarar:
> thanks for you reply
>  Do you have plans to support complex types, although not commonly
> used,
> but sometimes it is necessary.
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>


Re: sparksql query result is not same as persto on same sql

2017-12-01 Thread Bhavya Aggarwal
Thanks Dylan, We are looking into the problem.

Regards
Bhavya

On Fri, Dec 1, 2017 at 2:25 PM, dylan  wrote:

> hello all:
>  i am use carbondata version is 1.2.0 and spark version is 1.6.0.
>  in my test case
>1.Creating a Table
>   cc.sql("create table IF NOT EXISTS  test.table5(id string,name
> String,city String,age int) stored by 'carbondata' *
> tblproperties('DICTIONARY_INCLUDE' = 'age')* ")
>
>   2.load csv data into table,data like this:
> id,name,city,age
> 1,david,shenzhen,31
> 88,eason,shenzhen,27
> 3,jarry,wuhan,35
>
>3.select from sparksql,result is :
>   +-++---+--+--+
> | id  |  name  |   city| age  |
> +-++---+--+--+
> | 1   | david  | shenzhen  | 31   |
> | 3   | jarry  | wuhan | 35   |
> | 88  | eason  | shenzhen  | 27   |
> +-++---+--+--+
>this result is correct
>
>  4.select from presto,result is:
>  id | name  |   city   | age
> +---+--+-
>  1  | david | shenzhen |   3
>  3  | jarry | wuhan|   4
>  88 | eason | shenzhen |   2
> (3 rows)
>   look at the age filed,is wrong
>
> I know why this happens because I used dictionary encoding in the age
> field。
>
> Can anyone help me with this problem?
>
>
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>


Blog on how to use Carbondata with Presto

2017-11-22 Thread Bhavya Aggarwal
Hi All,

Please look at the blog to see how we can use CarbonData with Presto.


https://blog.knoldus.com/2017/11/20/integrating-presto-with-carbondata/




Thanks and regards
Bhavya


Re: Error while creating table in carbondata

2017-11-06 Thread Bhavya Aggarwal
Hi,

I think the problem is that the class signature of OpenSource spark and
Cloudera spark do not match for CatalogTable class,  there is an additional
parameter in the Cloudera spark version shown highlighted below, we may to
try building the Carbondata with the spark cloudera version to make it work.



















*case class CatalogTable(identifier: TableIdentifier,tableType:
CatalogTableType,storage: CatalogStorageFormat,schema:
StructType,provider: Option[String] = None,partitionColumnNames:
Seq[String] = Seq.empty,bucketSpec: Option[BucketSpec] = None,
owner: String = "",createTime: Long = System.currentTimeMillis,
lastAccessTime: Long = -1,properties: Map[String, String] =
Map.empty,stats: Option[Statistics] = None,viewOriginalText:
Option[String] = None,viewText: Option[String] = None,comment:
Option[String] = None,unsupportedFeatures: Seq[String] = Seq.empty,
tracksPartitionsInCatalog: Boolean = false,schemaPreservesCase: Boolean
= true) {*


Thanks and regards
Bhavya

On Tue, Nov 7, 2017 at 7:17 AM, Lionel CL <whuca...@outlook.com> wrote:

> mvn -DskipTests -Pspark-2.1 clean package
> The pom file was changed as which provided in former email.
>
>
>
> 在 2017/11/6 下午7:47,“Bhavya Aggarwal”<bha...@knoldus.com> 写入:
>
> >Hi,
> >
> >Can you please let me know how are you building the Carbondata assembly
> >jar, or which command you are running to build carbondata.
> >
> >Regards
> >Bhavya
> >
> >On Mon, Nov 6, 2017 at 2:18 PM, Lionel CL <whuca...@outlook.com> wrote:
> >
> >> Yes, there is a catalyst jar under the path
> /opt/cloudera/parcels/SPARK2/
> >> lib/spark2/jars/
> >>
> >> spark-catalyst_2.11-2.1.0.cloudera1.jar
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 在 2017/11/6 下午4:12,“Bhavya Aggarwal”<bha...@knoldus.com> 写入:
> >>
> >> >Hi,
> >> >
> >> >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars
> >> >folder for your  cloudera version, if its not there please try to
> include
> >> >it and retry.
> >> >
> >> >Thanks and regards
> >> >Bhavya
> >> >
> >> >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL <whuca...@outlook.com>
> wrote:
> >> >
> >> >> I have the same problem in CDH 5.8.0
> >> >> spark2 version is 2.1.0.cloudera1
> >> >> carbondata version 1.2.0.
> >> >>
> >> >> There's no error occurred when using open source version spark.
> >> >>
> >> >> 2.6.0-cdh5.8.0
> >> >> 2.1.0.cloudera1
> >> >> 2.11
> >> >> 2.11.8
> >> >>
> >> >>
> >> >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'")
> >> >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating
> >> Table
> >> >> with Database name [default] and Table name [t111]
> >> >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.
> >> >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/
> >> >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/
> >> >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/
> >> >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT
> >> >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option;
> >> >> Ljava/lang/String;JJLscala/collection/immutable/Map;
> >> >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;
> >> >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/
> >> >> catalog/CatalogTable;
> >> >>   at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
> >> >> bonSchema(CarbonSource.scala:253)
> >> >>   at org.apache.spark.sql.execution.command.DDLStrategy.apply(
> >> >> DDLStrategy.scala:135)
> >> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> >> $1.apply(QueryPlanner.scala:62)
> >> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> >> $1.apply(QueryPlanner.scala:62)
> >> >>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >> >>
> >> >>
> >> >> 在 2017/11/1 上午1:

Re: Error while creating table in carbondata

2017-11-06 Thread Bhavya Aggarwal
Hi,

Can you please let me know how are you building the Carbondata assembly
jar, or which command you are running to build carbondata.

Regards
Bhavya

On Mon, Nov 6, 2017 at 2:18 PM, Lionel CL <whuca...@outlook.com> wrote:

> Yes, there is a catalyst jar under the path /opt/cloudera/parcels/SPARK2/
> lib/spark2/jars/
>
> spark-catalyst_2.11-2.1.0.cloudera1.jar
>
>
>
>
>
>
>
> 在 2017/11/6 下午4:12,“Bhavya Aggarwal”<bha...@knoldus.com> 写入:
>
> >Hi,
> >
> >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars
> >folder for your  cloudera version, if its not there please try to include
> >it and retry.
> >
> >Thanks and regards
> >Bhavya
> >
> >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL <whuca...@outlook.com> wrote:
> >
> >> I have the same problem in CDH 5.8.0
> >> spark2 version is 2.1.0.cloudera1
> >> carbondata version 1.2.0.
> >>
> >> There's no error occurred when using open source version spark.
> >>
> >> 2.6.0-cdh5.8.0
> >> 2.1.0.cloudera1
> >> 2.11
> >> 2.11.8
> >>
> >>
> >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'")
> >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating
> Table
> >> with Database name [default] and Table name [t111]
> >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.
> >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/
> >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/
> >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/
> >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT
> >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option;
> >> Ljava/lang/String;JJLscala/collection/immutable/Map;
> >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;
> >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/
> >> catalog/CatalogTable;
> >>   at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
> >> bonSchema(CarbonSource.scala:253)
> >>   at org.apache.spark.sql.execution.command.DDLStrategy.apply(
> >> DDLStrategy.scala:135)
> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $1.apply(QueryPlanner.scala:62)
> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $1.apply(QueryPlanner.scala:62)
> >>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >>
> >>
> >> 在 2017/11/1 上午1:58,“chenliang613”<chenliang6...@gmail.com chenlia
> >> ng6...@gmail.com>> 写入:
> >>
> >> Hi
> >>
> >> Did you use open source spark version?
> >>
> >> Can you provide more detail info :
> >> 1. which carbondata version and spark version, you used ?
> >> 2. Can you share with us , reproduce script and steps.
> >>
> >> Regards
> >> Liang
> >>
> >>
> >> hujianjun wrote
> >> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id
> string,name
> >> string,city string,age Int)STORED BY 'carbondata'")
> >> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand:
> >> [master][root][Thread-1]Creating Table with Database name [clb_carbon]
> and
> >> Table name [carbon_table]
> >> java.lang.NoSuchMethodError:
> >> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg
> >> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/
> >> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/
> >> spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/
> >> apache/spark/sql/types/StructType;Lscala/Option;Lscala/
> >> collection/Seq;Lscala/Option;Ljava/lang/String;JJLscala/
> >> collection/immutable/Map;Lscala/Option;Lscala/Option;
> >> Lscala/Option;Lscala/Option;Lscala/collection/Seq;Z)Lorg/
> >> apache/spark/sql/catalyst/catalog/CatalogTable;
> >>at
> >> org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
> >> bonSchema(CarbonSource.scala:253)
> >>at
> >> org.apache.spark.sql.execution.strategy.DDLStrategy.apply(
> >> DDLStrategy.scala:154)
> >>at
> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $1.apply(QueryPlanner.scala:62)
> >>at
> >> org.apache.

Version upgrade for Presto Integration to 0.186

2017-11-02 Thread Bhavya Aggarwal
Hi All,

Presto 0.186 version has as lot of improvements that will increase the
performance and improve the reliability. Some of the major issues and
improvements are listed below.


   - Fix excessive GC overhead caused by map to map cast.
   - Fix issue that may cause queries containing expensive functions, such
   as regular expressions, to continue using CPU resources even after they are
   killed.
   - Fix performance issue caused by redundant casts
   - Fix leak in running query counter for failed queries. The counter
   would increment but never decrement for queries that failed before starting.
   - Reduce memory usage when building data of VARCHAR or VARBINARY types.
   - Estimate memory usage for GROUP BY more precisely to avoid out of
   memory errors.
   - Add Spill to Disk 
   for joins.

Currently the Presto version that we are using in Carbondata is 0.166 , I
would like to suggest to upgrade it to 0.186. Please let me know what the
group thinks about it.


Regards

Bhavya


Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-17 Thread Bhavya Aggarwal
Hi Dev,

For the Pre Aggregate tables how will we handle subsequent loads, will we
be running the query on the whole table and calculating the aggregations
again and then deleting the existing segment and creating the new segments
for whole data. With the above approach as the data increases in the main
table the Loading time will also be increasing substantially. Other way is
to intelligently determine the new values by querying the latest segment
and using them in collaboration with the existing pre-aggregated tables.
Please share your thoughts about it in this discussion.

Regards
Bhavya

On Mon, Oct 16, 2017 at 4:53 PM, Liang Chen  wrote:

> +1 , i agree with Jacky points.
> As we know, carbondata already be able to get very good performance for
> filter query scenarios through MDK index.  supports pre-aggregate in 1.3.0
> would improve aggregated query scenarios.   so users can use one carbondata
> to support all query cases(both filter and agg).
>
> To Lu cao, you mentioned this solution to build cube schema, it is too
> complex and there are many limitations, for example: the CUBE data can't
> support query detail data etc.
>
> Regards
> Liang
>
>
> Jacky Li wrote
> > Hi Lu Cao,
> >
> > In my previous experience on “cube” engine, no matter it is ROLAP or
> > MOLAP, it is something above SQL layer, because it not only need user to
> > establish cube schema by transform metadata from datawarehouse star
> schema
> > but also the engine defines its own query language like MDX, and many
> > times these languages are not standardized so that different vendor need
> > to provide different BI tools or adaptors for it.
> > So, although some vendor provides easy-to-use cube management tool, but
> it
> > at least has two problems: vendor locking and the rigid of the cube mode
> > once it defines. I think these problems are similar as in other vendor
> > specific solution.
> >
> > Currently one of the strength that carbon store provides is that it
> > complies to standard SQL support by integrating with SparkSQL, Hive, etc.
> > The intention of providing pre-aggregate table support is, it can enable
> > carbon improve OLAP query performance but still stick with standard SQL
> > support, it means all users still can use the same BI/JDBC
> > application/tool which can connect to SparkSQL, Hive, etc.
> >
> > If carbon should support “cube”, not only need to defines its
> > configuration which may be very complex and non-standard, but also will
> > force user to use vendor specific tools for management and visualization.
> > So, I think before going to this complexity, it is better to provide
> > pre-agg table as the first step.
> >
> > Although we do not want the full complexity of “cube” on arbitrary data
> > schema, but one special case is for timeseries data. Because time
> > dimension hierarchy (year/month/day/hour/minute/second) is naturally
> > understandable and it is consistent in all scenarios, so we can provide
> > native support for pre-aggregate table on time dimension. Actually it is
> a
> > cube on time and we can do automatic rollup for all levels in time.
> >
> > Finally, please note that, by using CTAS syntax, we are not restricting
> > carbon to support pre-aggreagate table only, but also arbitrary
> > materialized view, if we want in the future.
> >
> > Hope this make things more clear.
> >
> > Regards,
> > Jacky
> >
> >
> >
> >  like mandarin provides, Actually, as you can see in the document, I am
> > avoiding to call this “cube”.
> >
> >
> >> 在 2017年10月15日,下午9:18,Lu Cao 
>
> > whucaolu@
>
> >  写道:
> >>
> >> Hi Jacky,
> >> If user want to create a cube on main table, does he/she have to create
> >> multiple pre-aggregate tables? It will be a heavy workload to write so
> >> many
> >> CTAS commands. If user only need create a few pre-agg tables, current
> >> carbon already can support this requirement, user can create table first
> >> and then use insert into select statement. The only different is user
> >> need
> >> to query the pre-agg table instead of main table.
> >>
> >> So maybe we can enable user to create a cube model( in schema or
> >> metafile?)
> >> which contains multiple pre-aggregation definition and carbon can create
> >> those pre-agg tables automatically according to the model. That would be
> >> more easy for using and maintenance.
> >>
> >> Regards,
> >> Lionel
> >>
> >> On Sun, Oct 15, 2017 at 3:56 PM, Jacky Li 
>
> > jacky.likun@
>
> >  wrote:
> >>
> >>> Hi Liang,
> >>>
> >>> For alter table, data update/delete, and delete segment, they are the
> >>> same.
> >>> So I write in document “ User can manually perform this operation and
> >>> rebuild pre-aggregate table as
> >>> update scenario”
> >>> User need to drop the associated aggregate table and perform alter
> >>> table,
> >>> or data update/delete, or delete segment operation, then he can create
> >>> the
> >>> pre-agg table using CTAS command again, and the pre-aggregate table
> will
> >>> be

Re: [DISCUSSION] Support only spark 2 in carbon 1.3.0

2017-10-10 Thread Bhavya Aggarwal
+1

Regards
Bhavya

On Tue, Oct 10, 2017 at 7:15 PM, jarray  wrote:

> +1
>
>
>
>
>
>
> On 10/10/2017 14:13, gururajshetty wrote:
> + 1
>
> Regards,
> Gururaj
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>


Re: [VOTE] Apache CarbonData 1.2.0(RC3) release

2017-09-22 Thread Bhavya Aggarwal
+1

Thanks and regards
Bhavya

On Fri, Sep 22, 2017 at 5:51 PM, Ravindra Pesala 
wrote:

> Hi
>
> I submit the Apache CarbonData 1.2.0 (RC3) to your vote.
>
> 1.Release Notes:
> *https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12320220=12340260
>  projectId=12320220=12340260>*
>
> Some key improvement in this patch release:
>
>1. Sort columns feature:  It enables users to define only required
>columns (which are used in query filters) can be sorted while loading
> the
>data. It improves the loading speed., Note: currently support all data
> type
>excepting decimal, float, double.
>2. Support 4 type of sort scope: Local sort, Batch sort, Global sort, No
>sort while creating the table
>3. Support partition
>4. Optimize data update and delete for Spark 2.1
>5. Further, improve performance by optimizing measure filter feature
>6. DataMap framework to add custom indexes
>7. Ecosystem feature1: support Presto integration
>8. Ecosystem feature2: support Hive integration
>
>
>  2. The tag to be voted upon : apache-carbondata-1.2.0-rc3(commit:
> 09e07296a8e2a94ce429f6af333a9b15abb785de)
> *https://github.com/apache/carbondata/releases/tag/
> apache-carbondata-1.2.0-rc
>  apache-carbondata-1.2.0-rc2>3*
>
> 3.The artifacts to be voted on are located here:
> *https://dist.apache.org/repos/dist/dev/carbondata/1.2.0-rc3/
> *
>
> 4. A staged Maven repository is available for review at:
> *https://repository.apache.org/content/repositories/
> orgapachecarbondata-1023/
>  orgapachecarbondata-1023/>*
>
> 5. Release artifacts are signed with the following key:
> *https://people.apache.org/keys/committer/ravipesala.asc
> *
>
> Please vote on releasing this package as Apache CarbonData 1.2.0,  The vote
> will be open for the next 72 hours and passes if a majority of
> at least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache CarbonData 1.2.0
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
> Regards,
> Ravindra.
>


Re: [ANNOUNCE] Lu Cao as new Apache CarbonData committer

2017-09-13 Thread Bhavya Aggarwal
Congrats Lu Cao ..

Thanks and regards
Bhavya

On Wed, Sep 13, 2017 at 7:30 PM, Raghunandan S <
carbondatacontributi...@gmail.com> wrote:

> Congrats lu cao.
> On Wed, 13 Sep 2017 at 7:18 PM, Liang Chen 
> wrote:
>
> > Hi all
> >
> > We are pleased to announce that the PMC has invited Lu Cao as new
> > Apache
> > CarbonData committer, and the invite has been accepted !
> >
> >Congrats to Lu Cao and welcome aboard.
> >
> > Regards
> > The Apache CarbonData PMC
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>


Re: [ANNOUNCE] Manish Gupta as new Apache CarbonData

2017-08-27 Thread Bhavya Aggarwal
Congrats Manish.

Regards
Bhavya
On 28-Aug-2017 8:31 am, "xm_zzc" <441586...@qq.com> wrote:

> Congratulations Manish!!!
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/ANNOUNCE-Manish-
> Gupta-as-new-Apache-CarbonData-tp20750p20804.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>


Re: Presto+CarbonData optimization work discussion

2017-07-25 Thread Bhavya Aggarwal
I have created a pull request 1190  for Presto Optimization where we have
done following changes to improve the performance

1. Removed unnecessary loops from the integration code to make it more
efficient.
2. Implemented Lazy Blocks as is being used in case of ORC.
3. Improved dictionary decoding to have better results.

I have run this on my local machine for 2 GB data and results are attached
with this email, we see an improvement in almost all TPCH queries that we
have run.

Thanks and regards
Bhavya

On Thu, Jul 20, 2017 at 12:21 PM, rui qin  wrote:

> For -- 6) spark has the vectorized feature,but not in presto.How to
> implement
> it?
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/Presto-
> CarbonData-optimization-work-discussion-tp18509p18548.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>


PrestoQueryResults.xlsx
Description: MS-Excel 2007 spreadsheet


[Discussion] Using Lazy Dictionary Decode for Presto Integration

2017-07-18 Thread Bhavya Aggarwal
We were trying the Presto with carbon data and in the code currently
Carbondata is decoding the dictionary values into actual values as soon as
the data is read from Carbondata. I think if we do a lazy decode of
dictionary values after aggregation it will make the queries faster.
Please let me know if anybody have some thoughts about decoding it when
calculating the final results.

Thanks and regards
Bhavya


Re: [DISCUSSION] CarbonData Integration with Presto

2017-07-10 Thread Bhavya Aggarwal
Hi Linquer,

I am  trying to run with the configuration that you suggested on 500 GB
data , can you please verify the below properties.

*config.queries*
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8086
query.max-memory=5GB
#query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://xx.xx.xx.xx:8086
<http://www.google.com/url?q=http%3A%2F%2F46.4.88.233%3A8086=D=1=AFQjCNEjpC9wVuY-DNEiGBrOUyQOy3cMjg>


*jvm.config*
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:ReservedCodeCacheSize=1G

Regards
Bhavya

On Mon, Jul 3, 2017 at 11:45 AM, Bhavya Aggarwal <bha...@knoldus.com> wrote:

> Thanks Linquer,
>
> I will try with the above option and we tried it with ORC as well , here
> are the comparison results for same query with 50 GB of data and same
> configuration that I mentioned earlier.
>
> [image: Inline image 1]
>
> Thanks and regards
> Bhavya
>
>
>
> On Mon, Jul 3, 2017 at 9:39 AM, linqer <26304...@qq.com> wrote:
>
>> Hi
>> 1 you can set -XX:ReservedCodeCacheSize in JVM.propertes, make it big
>> 2 I don't see your discovery.uri IP,  make user Master act as coordinate
>> and
>> discovery server;
>> 3 remove query.max-memory-per-node firstly.
>> 4 you can test orc, parquet is not support very well on presto.
>>
>> thanks
>>
>>
>>
>> --
>> View this message in context: http://apache-carbondata-dev-m
>> ailing-list-archive.1130556.n5.nabble.com/DISCUSSION-CarbonD
>> ata-Integration-with-Presto-tp16793p17055.html
>> Sent from the Apache CarbonData Dev Mailing List archive mailing list
>> archive at Nabble.com.
>>
>
>


Re: Fwd: [DISCUSSION] CarbonData Integration with Presto

2017-07-05 Thread Bhavya Aggarwal
I am currently loading 500 GB data once that is completed will try with
your settings today itself.

Regards
Bhavya
On 06-Jul-2017 8:39 am, "linqer" <26304...@qq.com> wrote:

> hi, did you test orc vs carbondata with configuration as I  suggested?
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-
> CarbonData-Integration-with-Presto-tp16793p17419.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>


Re: [DISCUSSION] Propose to move notification of "jira Created" to issues@mailing list from dev

2017-07-04 Thread Bhavya Aggarwal
+1
Agreed these should be two seperate mailing lists.

Thanks and Regards
Bhavya

On Tue, Jul 4, 2017 at 5:20 PM, Venkata Gollamudi 
wrote:

> +1
> It is better to be moved
>
> Regards,
> Venkata Ramana G
>
> On Tue, Jul 4, 2017 at 4:40 PM, Kumar Vishal 
> wrote:
>
> > +1
> > Better to move to issue mailing list
> >
> > Regards
> > Kumar Vishal
> >
> > Sent from my iPhone
> >
> > > On 03-Jul-2017, at 15:02, Ravindra Pesala 
> wrote:
> > >
> > > +1
> > > Yes, we should move to issues mailing list.
> > >
> > > Regards,
> > > Ravindra.
> > >
> > >> On 30 June 2017 at 07:35, Erlu Chen  wrote:
> > >>
> > >> Agreed, we can separate discussion and created JIRA.
> > >>
> > >> It will be better for develops to filter some unnecessary message and
> > focus
> > >> on discussion.
> > >>
> > >> Regards.
> > >> Chenerlu.
> > >>
> > >>
> > >>
> > >> --
> > >> View this message in context: http://apache-carbondata-dev-
> > >> mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-
> > >> Propose-to-move-notification-of-jira-Created-to-issues-
> > >> mailing-list-from-dev-tp16835p16842.html
> > >> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> > >> archive at Nabble.com.
> > >>
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Ravi
> >
>


Re: [DISCUSSION] CarbonData Integration with Presto

2017-07-03 Thread Bhavya Aggarwal
Thanks Linquer,

I will try with the above option and we tried it with ORC as well , here
are the comparison results for same query with 50 GB of data and same
configuration that I mentioned earlier.

[image: Inline image 1]

Thanks and regards
Bhavya



On Mon, Jul 3, 2017 at 9:39 AM, linqer <26304...@qq.com> wrote:

> Hi
> 1 you can set -XX:ReservedCodeCacheSize in JVM.propertes, make it big
> 2 I don't see your discovery.uri IP,  make user Master act as coordinate
> and
> discovery server;
> 3 remove query.max-memory-per-node firstly.
> 4 you can test orc, parquet is not support very well on presto.
>
> thanks
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-
> CarbonData-Integration-with-Presto-tp16793p17055.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>


Re: [DISCUSSION] CarbonData Integration with Presto

2017-07-03 Thread Bhavya Aggarwal
Hi Liang,

We went through the documentation of Presto and there were some issues with
Presto 0.166 version which were resolved in later versions. There is a lot
of performance improvement in 0.166 and 0.179  as the ways joins are
interpreted in Presto are different from 0.166. Please see below for two
most important reasons why we choose to ran it on 0.179.

1. Fixed issue which could cause incorrect results when processing
dictionary encoded data. If the expression can fail on bad input, the
results from filtered-out rows containing bad input may be included in the
query output.

2. The order in which joins are executed in a query can have a significant
impact on the query’s performance. The aspect of join ordering that has the
largest impact on performance is the size of the data being processed and
passed over the network. If a join is not a primary key-foreign key join,
the data produced can be much greater than the size of either table in the
join– up to |Table 1| x |Table 2| for a cross join. If a join that produces
a lot of data is performed early in the execution, then subsequent stages
will need to process large amounts of data for longer than necessary,
increasing the time and resources needed for the query causing query
failure. This issue has been fixed in the presto-version 11.3. Release
0.178 onwards.

Thanks and regards
Bhavya


On Sun, Jul 2, 2017 at 10:41 AM, Liang Chen <chenliang...@apache.org> wrote:

> Hi Bhavya
>
> Currently, 1.2.0 propose to support presto version with 0.166.
> Is there any performance difference between 0.179 and 0.166?
>
> Regards
> Liang
>
>
> 2017-07-01 13:12 GMT+08:00 Bhavya Aggarwal <bha...@knoldus.com>:
>
> > Hi,
> >
> > Please find the configuration setting that we used attached with this
> > email , we are running Presto Server 0.179 for our testing.
> >
> > Thanks and regards
> > Bhavya
> >
> > On Fri, Jun 30, 2017 at 8:23 AM, linqer <26304...@qq.com> wrote:
> >
> >> Can you give me your all configuration files (etc/) of coordinate and
> >> worker,
> >> What configuration tuning did you make for carbondata?
> >>
> >> A lot of scenes in our company are using Presto,I have tested presto
> read
> >> orc vs carbondata  , Carbondata is clearly inferior to Orc;
> >>
> >> During the testing, for performance reasons, we used replica join in
> >> particular, and we increased the JVM codecache size. but these are
> equally
> >> fair to both ORC and carbondata
> >>
> >> Carbondata_vs_ORC_on_Presto_Benchmark_testing.docx
> >> <http://apache-carbondata-dev-mailing-list-archive.1130556.n
> >> 5.nabble.com/file/n16849/Carbondata_vs_ORC_on_Presto_Benchma
> >> rk_testing.docx>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://apache-carbondata-dev-m
> >> ailing-list-archive.1130556.n5.nabble.com/DISCUSSION-CarbonD
> >> ata-Integration-with-Presto-tp16793p16849.html
> >> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> >> archive at Nabble.com.
> >>
> >
> >
>


[DISCUSSION] Whether Carbondata should work with Presto in the next release version(1.2.0)

2017-06-11 Thread Bhavya Aggarwal
Hi All,

We can add the Presto integration as one of the items for 1.2.0 release, we
can add support for Presto to read from Carbondata as Presto is used by
many people for query execution. Please vot and discuss on Presto
integration in this mail thread.


Thanks and regards
Bhavya


[DISCUSSION] Whether Carbondata should support Hive in the next release version(1.2.0)

2017-06-11 Thread Bhavya Aggarwal
Hi Guys,

Should we add Hive Integration with CarbonData in release 1.2.0, it will be
good if we can come up with features that needs to be supported in the Hive
Integration. Please vote and give your comments for the same in this
discussion.


Thanks and regards
Bhavya


Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Bhavya Aggarwal
Congrats Ravindra,

Regards
Bhavya

On Fri, May 19, 2017 at 4:56 PM, Liang Chen  wrote:

> Hi all
>
> We are pleased to announce that the PMC has invited Ravindra as new Apache
> CarbonData PMC member, and the invite has been accepted !
>
> Congrats to Ravindra and welcome aboard.
>
> Thanks
> The Apache CarbonData team
>


Re: [ANNOUNCE] Cai Qiang as new Apache CarbonData committer

2017-05-17 Thread Bhavya Aggarwal
Congrats David...

Regards
Bhavya

On Wed, May 17, 2017 at 9:50 PM, Naresh P R 
wrote:

> Congrats David !!!
> 
> Regards,
> Naresh P R
>
> On May 17, 2017 7:05 PM, "Liang Chen"  wrote:
>
> Hi all
>
> We are pleased to announce that the PMC has invited Cai Qiang as new Apache
> CarbonData committer, and the invite has been accepted !
>
> Congrats to Cai Qiang and welcome aboard.
>
> Regards
> Liang
>


Re: [VOTE] Apache CarbonData 1.1.0 (RC3) release

2017-05-12 Thread Bhavya Aggarwal
+1

On Sat, May 13, 2017 at 12:23 AM, Jihong Ma  wrote:

> +1
>
> Jihong
>
> -Original Message-
> From: Liang Chen [mailto:chenliang6...@gmail.com]
> Sent: Friday, May 12, 2017 12:20 AM
> To: dev@carbondata.apache.org
> Subject: Re: [VOTE] Apache CarbonData 1.1.0 (RC3) release
>
> +1(binding)
>
> LICENSE,NOTICE are ok
> no binary file
> compile is ok with spark 1.6 and 2.1
>
> *mvn clean -Pspark-1.6 package*
> [INFO] Apache CarbonData :: Parent  SUCCESS [
>  1.520 s]
> [INFO] Apache CarbonData :: Common  SUCCESS [
>  2.546 s]
> [INFO] Apache CarbonData :: Core .. SUCCESS [
> 44.153 s]
> [INFO] Apache CarbonData :: Processing  SUCCESS [
>  7.531 s]
> [INFO] Apache CarbonData :: Hadoop  SUCCESS [
>  7.117 s]
> [INFO] Apache CarbonData :: Spark Common .. SUCCESS [
> 23.700 s]
> [INFO] Apache CarbonData :: Spark . SUCCESS [03:37
> min]
> [INFO] Apache CarbonData :: Spark Common Test . SUCCESS [04:51
> min]
> [INFO] Apache CarbonData :: Assembly .. SUCCESS [
>  4.476 s]
> [INFO] Apache CarbonData :: Spark Examples  SUCCESS [
> 12.540 s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
>
> *mvn clean -Pspark-2.1 package*
> [INFO] Apache CarbonData :: Parent  SUCCESS [
>  1.884 s]
> [INFO] Apache CarbonData :: Common  SUCCESS [
>  3.198 s]
> [INFO] Apache CarbonData :: Core .. SUCCESS [
> 43.969 s]
> [INFO] Apache CarbonData :: Processing  SUCCESS [
>  8.116 s]
> [INFO] Apache CarbonData :: Hadoop  SUCCESS [
>  8.413 s]
> [INFO] Apache CarbonData :: Spark Common .. SUCCESS [
> 26.447 s]
> [INFO] Apache CarbonData :: Spark2  SUCCESS [03:12
> min]
> [INFO] Apache CarbonData :: Spark Common Test . SUCCESS [06:35
> min]
> [INFO] Apache CarbonData :: Assembly .. SUCCESS [
>  5.016 s]
> [INFO] Apache CarbonData :: Spark2 Examples ... SUCCESS [
> 12.147 s]
> [INFO]
> 
> [INFO] BUILD SUCCESS
>
>
> Regards
> Liang
>
> 2017-05-12 1:08 GMT+08:00 Ravindra Pesala :
>
> > Hi
> >
> > I submit the Apache CarbonData 1.1.0 (RC3) to your vote.
> >
> > *1.Release Notes:*https://issues.apache.org/jira/secure/ReleaseNote.
> > jspa?projectId=12320220=12338987
> >
> > Key features of this release are highlighted as below.
> >
> >-  Introduced new data format called V3 to improve scan performance
> (~20
> >to 50%).
> >-  Alter table support in carbondata. (Only for Spark 2.1)
> >-  Supported Batch Sort to improve data loading performance.
> >-  Improved Single pass load by upgrading to latest netty framework
> and
> >launched dictionary client for each loading
> >-  Supported range filters to combine the between filters to one
> filter
> >to improve the filter performance.
> >-  Many improvements done on large cluster especially in query
> >processing.
> >-  More than 160 bugs and many improvements done in this release.
> >-
> >
> >  2. The tag to be voted upon : apache-carbondata-1.1.0-rc3
> > (commit: 88eb7e0860506bfeea3a08e1605a89dc8d5a4ab6)
> >
> > *https://github.com/apache/carbondata/commit/
> > 88eb7e0860506bfeea3a08e1605a89dc8d5a4ab6
> >  > 88eb7e0860506bfeea3a08e1605a89dc8d5a4ab6>*
> >
> > 3.The artifacts to be voted on are located here:
> https://dist.apache.org/
> > repos/dist/dev/carbondata/1.1.0-rc3/
> >
> > 4. A staged Maven repository is available for review at:
> > https://repository.apache.org/content/repositories/
> > orgapachecarbondata-1013
> >
> > 5. Release artifacts are signed with the following key:
> >
> > https://people.apache.org/keys/committer/ravipesala.asc
> >
> >
> > Please vote on releasing this package as Apache CarbonData 1.1.0,  The
> vote
> > will beopen for the next 72 hours and passes if a majority of
> >
> > at least three +1 PMC votes are cast.
> >
> >
> > [ ] +1 Release this package as Apache CarbonData 1.1.0
> >
> > [ ] 0 I don't feel strongly about it, but I'm okay with the release
> >
> > [ ] -1 Do not release this package because...
> > ---
> > Thanks & Regards,
> > Ravindra.
> >
>
>
>
> --
> Regards
> Liang
>


Re: Use dev@carbondata.apache.org to test new mailing list without incubator

2017-05-03 Thread Bhavya Aggarwal
ACK

Regards
Bhavya

On Wed, May 3, 2017 at 12:31 PM, Pallavi Singh 
wrote:

> ACK
>
> 2017-05-02 11:20 GMT+05:30 Mohammad Shahid Khan <
> mohdshahidkhan1...@gmail.com>:
>
> > ACK
> >
> > On Wed, Apr 26, 2017 at 11:47 PM, Vimal Das Kammath <
> > vimaldas.kamm...@gmail.com> wrote:
> >
> > > ACK
> > >
> > > On Tue, Apr 25, 2017 at 10:14 PM, Henry Saputra <
> henry.sapu...@gmail.com
> > >
> > > wrote:
> > >
> > > > ACK
> > > > On Tue, Apr 25, 2017 at 5:54 AM Liang Chen 
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Regards | Pallavi Singh
> Software Consultant
> Knoldus Software LLP
> pallavi.si...@knoldus.in
> +91-9911235949
>