Re: [Discussion] Migrate CarbonData to support PrestoSQL

2019-05-09 Thread xm_zzc
I agree with this way.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Migrate CarbonData to support PrestoSQL

2019-05-06 Thread xm_zzc
Hi:
  IMO, currently there are still many users using original PrestoDB(0.217),
and replacing it is costly for users. I think it's better to add new
PrestoSQL(310) as new module if possible, and don't replace old one
directly.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ANNOUNCE] Akash as new Apache CarbonData committer

2019-04-25 Thread xm_zzc
Congratulations !!!

Regards 
Zhichao



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Add new compaction type for compacting delta data file

2019-04-02 Thread xm_zzc
Hi:
  Just as I said before, we can add a new compaction type called
'iud_delta_compact' for command 'alter table table_name compact' to support
this feature.
  The concurrency of this feature will be handled as the same as other
compaction type, and it's recommended to do this operation in off peak
hours.

  I will create a jira task for this feature later.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Add new compaction type for compacting delta data file

2019-04-02 Thread xm_zzc
Hi:
  Just as I said before, we can add a compaction type called
'iud_delta_compact' for command 'alter table table_name compact' to support
this feature.
  The concurrency for this feature will be handled as the same as other
compaction types, and it's 
recommended to do this operation in off peak hours.

  I will create a jira for this feature later.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Add new compaction type for compacting delta data file

2019-03-29 Thread xm_zzc
Hi, Akash R:
  thanks for your reply.
  I am talking about the delta data files including update and delete delta
files. Horizontal compaction just compacts all delta files into one in one
segment, right? But if the size of segment is big and the size of update and
delete delta file is big too, I think it will effect the query performance,
because it needs to filter carbondata files and delta file, right?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[DISCUSSION] Add new compaction type for compacting delta data file

2019-03-28 Thread xm_zzc
Hi dev:
  Currently CarbonData supports using compaction command to compact delta
data into carbondata file, but it needs two or more segments to be
compacted, if the size of these segments is big and user don't want to
compact them(it needs to spend a lot of time), just want to compact delta
data files into carbondata files for every segment. 
  Discuss with Jacky and David offline, there is a way to do this: add new
compaction type for compacting delta data files for each segment, for
example:
  alter table table_name compact 'iud_delta' where segment.id in (0.2), this
command will compact all delta data files of segment 0.2 into carbondata
files as new segment 0.3.

  Any suggestion for this, thanks.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Carbondata performance over Parquet

2019-03-25 Thread xm_zzc
Can you give us the full properties of creating table sql? also the sql which
is low performance.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: carbondata timestamp has a bug

2019-03-22 Thread xm_zzc
Do you add below code before creating CarbonSession:

CarbonProperties.getInstance()
.addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd
HH:mm:ss")
.addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd")

val spark = SparkSession
  .builder()
  .master("local[4]")
  .appName("Carbon1_5")
  .getOrCreateCarbonSession(storeLocation)





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: How to fix

2019-03-22 Thread xm_zzc
Yeap, please give your account of jira.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: carbondata timestamp has a bug

2019-03-22 Thread xm_zzc
Do you mean that you  want to define the format of timestamp? You can use
'CARBON_TIMESTAMP_FORMAT' parameter.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: How to fix

2019-03-22 Thread xm_zzc
Hi, you can first take a look at this doc:
https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md
.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] is it necessary to support SORT_COLUMNS modification

2019-03-16 Thread xm_zzc
Nice feature. Just one suggestion: we can support to convert a specified old
segment by new SORT_COLUMNS, for example: resort table table_name for
segment 0.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: does carbondata SDK support transaction?

2019-03-16 Thread xm_zzc
SDK writer does not support transaction now, you can create a stream table
and use spark streaming to read data from kafka and write . 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.5.2(RC1) release

2019-01-22 Thread xm_zzc
-1, IMO, it better include PR#3082, PR#3089, PR#3090 in version 1.5.2.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSS] Move to gitbox as per ASF infra team mail

2019-01-04 Thread xm_zzc
+1 .



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ANNOUNCE] Chuanyin Xu as new PMC for Apache CarbonData

2019-01-01 Thread xm_zzc
Congratulation, Chuanyin Xu.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[carbondata-presto enhancements] support reading stream segment in presto

2018-12-13 Thread xm_zzc
Hi all:
  Do we plan to support reading stream segment in presto? if yes, when to
implement this feature?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ANNOUNCE] Bo Xu as new Apache CarbonData committer

2018-12-08 Thread xm_zzc
Congrats xubo.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


How to compile CarbonData 1.5.1 with Spark 2.3.1

2018-12-05 Thread xm_zzc
Hi:
  The steps to compile CarbonData 1.5.1 with Spark 2.3.1 are as follows:
  1. cover CarbonDataSourceScan.scala: cp -f
integration/spark2/src/main/commonTo2.1And2.2/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala
integration/spark2/src/main/spark2.3/org/apache/spark/sql/execution/strategy/CarbonDataSourceScan.scala

  2. Edit
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/bigdecimal/TestBigDecimal.scala:
 
Line 48: change 'salary decimal(30, 10))' to 'salary decimal(27, 10))'

  3. Edit
integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala:
1) Line 297: change 'classOf[Seq[String]],' to
'classOf[Seq[Attribute]],'
2) Replace Line 299-301 with a line 'method.invoke(dataSourceObj, mode,
query, query.output, physicalPlan)';

  4. Use command to compile: mvn -DskipTests -Pspark-2.3 -Phadoop-2.8
-Pbuild-with-format -Pmv -Dspark.version=2.3.1
-Dhadoop.version=2.6.0-cdh5.8.3 clean package. It worked.

  You can refer to the  PR#2779
  .




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.5.1(RC2) release

2018-12-03 Thread xm_zzc
+1 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Throw 'NoSuchElementException: None.get' error when use CarbonSession to read parquet.

2018-11-21 Thread xm_zzc
PR#2863 has fixed this issue, thanks, Ravindra.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Proposal] Thoughts on general guidelines to follow in Apache CarbonData community

2018-11-18 Thread xm_zzc
+1 for 1,2,3,4,5,7,8,9,10.
+0 for 6.

One suggestion for 7, it better to give a performance comparison [TPCH]
report for each release version.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Throw 'NoSuchElementException: None.get' error when use CarbonSession to read parquet.

2018-11-15 Thread xm_zzc
Hi:
  Please help. I used CarbonSession to read parquet and it throws
'NoSuchElementException: None.get' error, reading carbondata files is ok.
  *Env*: local mode, Spark 2.3 + CarbonData(master branch)
  *Code*: 
import org.apache.spark.sql.CarbonSession._
val spark = SparkSession
  .builder()
  .master("local[1]")
  .appName("Carbon1_5")
  .config("spark.sql.warehouse.dir", warehouse)
  .config("spark.default.parallelism", 4)
  .config("spark.sql.shuffle.partitions", 4)
  .getOrCreateCarbonSession(storeLocation, Constants.METASTORE_DB)
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val parquets = spark.read.parquet("/data1/parquets/")
println(parquets.count())

  *Error*:
Exception in thread "main" java.util.ServiceConfigurationError:
org.apache.spark.sql.sources.DataSourceRegister: Provider
org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat
could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at
scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:258)
at 
scala.collection.TraversableLike$class.filter(TraversableLike.scala:270)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
at 
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:622)
at 
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:606)
at cn.xm.zzc.carbonmaster.Carbon1_5$.testReadSpeed(Carbon1_5.scala:434)
at cn.xm.zzc.carbonmaster.Carbon1_5$.main(Carbon1_5.scala:105)
at cn.xm.zzc.carbonmaster.Carbon1_5.main(Carbon1_5.scala)
Caused by: java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at
org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat.(SparkCarbonFileFormat.scala:120)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 17 more

  Thanks.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Throw NullPointerException occasionally when query from stream table

2018-11-06 Thread xm_zzc
Hi David:
  please see the call stack: 

 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Throw NullPointerException occasionally when query from stream table

2018-11-06 Thread xm_zzc
Hi:
  The root cause is that when execute select sql, BlockDataMap will call
'SegmentPropertiesAndSchemaHolder.addSegmentProperties ' to add segment info
one by one, meanwhile if there are some segments updated, for example,
stream segment is handoff , handoff thread will call
'SegmentPropertiesAndSchemaHolder.invalidate' to delete segment info one by
one, if segmentIdAndSegmentPropertiesIndexWrapper.segmentIdSet.isEmpty() is
true, it will remove segmentPropertiesIndex, but select thread is still
using segmentPropertiesIndex to add/get segment info, and then NPE occur. 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Throw NullPointerException occasionally when query from stream table

2018-10-31 Thread xm_zzc
Hi: 
  I added some logs to trace this problem, found that when call
BlockDataMap.getFileFooterEntrySchema, the key 'segmentPropertiesIndex'
which was stored in BlockDataMap instance was removed by other thread from
SegmentPropertiesAndSchemaHolder.indexToSegmentPropertiesWrapperMapping :

2018-10-31 14:49:24,967
datastore.block.SegmentPropertiesAndSchemaHolder.addSegmentProperties(SegmentPropertiesAndSchemaHolder.java:115)
- 
Thread-39 -put 37 into indexToSegmentPropertiesWrapperMapping 0 
2018-10-31 14:49:25,472
datastore.block.SegmentPropertiesAndSchemaHolder.invalidate(SegmentPropertiesAndSchemaHolder.java:243)
- 
Executor task launch worker for task 926 -remove 37 out of
indexToSegmentPropertiesWrapperMapping 31 
2018-10-31 14:49:25,486
indexstore.blockletindex.BlockDataMap.getFileFooterEntrySchema(BlockDataMap.java:1002)
- 
Thread-39 -get 37 null


2018-10-31 14:56:45,057
datastore.block.SegmentPropertiesAndSchemaHolder.addSegmentProperties(SegmentPropertiesAndSchemaHolder.java:115)
- 
Thread-39 -put 98 into indexToSegmentPropertiesWrapperMapping 0 
2018-10-31 14:56:45,477
datastore.block.SegmentPropertiesAndSchemaHolder.invalidate(SegmentPropertiesAndSchemaHolder.java:243)
- 
Executor task launch worker for task 2653 -remove 98 out of
indexToSegmentPropertiesWrapperMapping 67
2018-10-31 14:56:46,290
indexstore.blockletindex.BlockDataMap.getFileFooterEntrySchema(BlockDataMap.java:1002)
- 
Thread-39 -get 98 null 
2018-10-31 14:56:51,392
indexstore.blockletindex.BlockDataMap.getFileFooterEntrySchema(BlockDataMap.java:1002)
- 
Thread-39 -get 98 null 




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] CarbonReader performance improvement

2018-10-29 Thread xm_zzc
+1.




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Low Performance of full scan.

2018-10-29 Thread xm_zzc
Hi Ravindra:
I re-test my test cases mentioned above with Spark 2.3.2 + CarbonData
master branch, the query performance of carbondata are almost the same as
the parquet:

*Test result:** 
  SQL1:Parquet:  4.6s   4s 3.8s   
   CarbonData:   4.7s   3.6s   3.5s   
  SQL2:Parquet:  9s 8s  8s 
   CarbonData:   9s 8s  8s 

  The query performance of CarbonData has improved a lot (SQL1: 12s to 4s,
SQL2: 18 to 8s) while the query performance of parquet has also improved
(SQL2: 10s to 8s). That's great.
  But I saw the test result you mentioned in
'http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/CarbonData-Performance-Optimization-td62950.html',
the query performance of carbondata were almost better than the parquet. I
want to know how you tested those cases? And are there other optimizations
that have not been merged yet?

Regards, 
Zhichao.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Throw NullPointerException occasionally when query from stream table

2018-10-29 Thread xm_zzc
Hi:
  I ran a structured streaming app on local[4] mode (Spark 2.3.2 +
CarbonData master branch) to insert data, and then started a thread to
execute select sql, the 'NullPointerException' occured occasionally.
  *I found that the smaller the value of CarbonCommonConstants.HANDOFF_SIZE
is, the more easily the error occur*.
  Please see my test code:  CarbonStream1_5.scala

  
  
  The  NullPointerException is :
  Exception in thread "Thread-42" java.lang.NullPointerException
at
org.apache.carbondata.core.indexstore.blockletindex.BlockDataMap.getFileFooterEntrySchema(BlockDataMap.java:1001)
at
org.apache.carbondata.core.indexstore.blockletindex.BlockDataMap.prune(BlockDataMap.java:656)
at
org.apache.carbondata.core.indexstore.blockletindex.BlockDataMap.prune(BlockDataMap.java:743)
at
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getAllBlocklets(BlockletDataMapFactory.java:391)
at
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:132)
at
org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:491)
at
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:412)
at
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:528)
at
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:219)
at
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:127)
at
org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:67)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.ShuffleDependency.(Dependency.scala:91)
at
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.prepareShuffleDependency(ShuffleExchangeExec.scala:321)
at
org.apache.spark.sql.execution.TakeOrderedAndProjectExec.doExecute(limit.scala:154)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371)
at
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
at
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at

Re: java.lang.NegativeArraySizeException occurred when compact

2018-10-17 Thread xm_zzc
Hi Kunal Kapoor: 
  I have patched PR#2796 into 1.3.1 and run stream app again, this issue
does not happen often, I will run for a few days to check whether it works.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: java.lang.NegativeArraySizeException occurred when compact

2018-10-17 Thread xm_zzc
Hi Kunal Kapoor:
  1.  No;
  2.  query unsuccessfully, I use Carbon SDK Reader to read that wrong
segment and it failed too.
  3.  the schema of the table:

| rt| string
| timestamp_1min| bigint
| timestamp_5min| bigint
| timestamp_1hour   | bigint
| customer_id   | bigint
| transport_id  | bigint
| transport_code| string
| tcp_udp   | int   
| pre_hdt_id| string
| hdt_id| string
| status| int   
| is_end_user   | int   
| transport_type| string
| transport_type_nam| string
| fcip  | string
| host  | string
| cip   | string
| code  | int   
| conn_status   | int   
| recv  | bigint
| send  | bigint
| msec  | bigint
| dst_prefix| string
| next_type | int   
| next  | string
| hdt_sid   | string
| from_endpoint_type| int   
| to_endpoint_type  | int   
| fcip_view | string
| fcip_country  | string
| fcip_province | string
| fcip_city | string
| fcip_longitude| string
| fcip_latitude | string
| fcip_node_name| string
| fcip_node_name_cn | string
| host_view | string
| host_country  | string
| host_province | string
| host_city | string
| host_longitude| string
| host_latitude | string
| cip_view  | string
| cip_country   | string
| cip_province  | string
| cip_city  | string
| cip_longitude | string
| cip_latitude  | string
| cip_node_name | string
| cip_node_name_cn  | string
| dtp_send  | string
| client_port   | int   
| server_ip | string
| server_port   | int   
| state | string
| response_code | int   
| access_domain | string
| valid | int   
| min_batch_time| bigint
| update_time   | bigint
|   |   
| ##Detailed Table Information  |   
| Database Name | hdt_sys   

Re: java.lang.NegativeArraySizeException occurred when compact

2018-10-16 Thread xm_zzc
Hi Babu:
  Thanks for your reply.
  I set enable.unsafe.in.query.processing=false  and
enable.unsafe.columnpage=false , and test failed still.
  I think the issue I met is not related to MemoryBlock which is cleaned by
some other thread. As the test steps I mentioned above, I copy the wrong
segment and use SDKReader to read data, it failed too, the error  message is
following:
*java.lang.RuntimeException: java.lang.IllegalArgumentException
at
org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk.convertToDimColDataChunkWithOutCache(DimensionRawColumnChunk.java:120)
at
org.apache.carbondata.core.scan.result.BlockletScannedResult.fillDataChunks(BlockletScannedResult.java:355)
at
org.apache.carbondata.core.scan.result.BlockletScannedResult.hasNext(BlockletScannedResult.java:559)
at
org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector.collectResultInRow(DictionaryBasedResultCollector.java:137)
at
org.apache.carbondata.core.scan.processor.DataBlockIterator.next(DataBlockIterator.java:109)
at
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:49)
at
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41)
at
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:1)
at
org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator.hasNext(ChunkRowIterator.java:58)
at
org.apache.carbondata.hadoop.CarbonRecordReader.nextKeyValue(CarbonRecordReader.java:104)
at
org.apache.carbondata.sdk.file.CarbonReader.hasNext(CarbonReader.java:71)
at cn.xm.zzc.carbonsdktest.CarbonSDKTest.main(CarbonSDKTest.java:68)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:244)
at
org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeVariableLengthDimensionDataChunkStore.putArray(UnsafeVariableLengthDimensionDataChunkStore.java:97)
at
org.apache.carbondata.core.datastore.chunk.impl.VariableLengthDimensionColumnPage.(VariableLengthDimensionColumnPage.java:58)
at
org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeDimensionLegacy(CompressedDimensionChunkFileBasedReaderV3.java:325)
at
org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeDimension(CompressedDimensionChunkFileBasedReaderV3.java:266)
at
org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3.decodeColumnPage(CompressedDimensionChunkFileBasedReaderV3.java:224)
at
org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk.convertToDimColDataChunkWithOutCache(DimensionRawColumnChunk.java:118)
... 11 more*

when error occurred, the values of some parameters in
UnsafeVariableLengthDimensionDataChunkStore.putArray are as following :

buffer.limit=192000
buffer.cap=192000
startOffset=300289
numberOfRows=32000
this.dataPointersOffsets=288000

startOffset is bigger than buffer.limit, so error occurred.




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


java.lang.NegativeArraySizeException occurred when compact

2018-10-15 Thread xm_zzc
Hi:
  I encounter 'java.lang.NegativeArraySizeException' error with carbondata
1.3.1 + spark 2.2.
  When I run the compact command to compact 8 level-1 segments to a level-2
segment, the 'java.lang.NegativeArraySizeException' error occurred:
*java.lang.NegativeArraySizeException
at
org.apache.carbondata.core.datastore.chunk.store.impl.unsafe.UnsafeVariableLengthDimesionDataChunkStore.getRow(UnsafeVariableLengthDimesionDataChunkStore.java:172)
at
org.apache.carbondata.core.datastore.chunk.impl.AbstractDimensionDataChunk.getChunkData(AbstractDimensionDataChunk.java:46)
at
org.apache.carbondata.core.scan.result.AbstractScannedResult.getNoDictionaryKeyArray(AbstractScannedResult.java:431)
at
org.apache.carbondata.core.scan.result.impl.NonFilterQueryScannedResult.getNoDictionaryKeyArray(NonFilterQueryScannedResult.java:67)
at
org.apache.carbondata.core.scan.collector.impl.RawBasedResultCollector.scanResultAndGetData(RawBasedResultCollector.java:83)
at
org.apache.carbondata.core.scan.collector.impl.RawBasedResultCollector.collectData(RawBasedResultCollector.java:58)
at
org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:51)
at
org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.next(DataBlockIteratorImpl.java:32)
at
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.getBatchResult(DetailQueryResultIterator.java:49)
at
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:41)
at
org.apache.carbondata.core.scan.result.iterator.DetailQueryResultIterator.next(DetailQueryResultIterator.java:31)
at
org.apache.carbondata.core.scan.result.iterator.RawResultIterator.hasNext(RawResultIterator.java:72)
at
org.apache.carbondata.processing.merger.RowResultMergerProcessor.execute(RowResultMergerProcessor.java:131)
at
org.apache.carbondata.spark.rdd.CarbonMergerRDD$$anon$1.(CarbonMergerRDD.scala:228)
at
org.apache.carbondata.spark.rdd.CarbonMergerRDD.internalCompute(CarbonMergerRDD.scala:84)
at
org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)*

I traced the code of 'UnsafeVariableLengthDimesionDataChunkStore.getRow',
found that the root cause is the value of length is negative when create
byte array: 'byte[] data = new byte[length];', the value of some parameters
are below when error ocurred:

when 'rowId < numberOfRows - 1':
*this.dataLength=192000
currentDataOffset=2
rowId=0
OffsetOfNextdata=-12173  (why)
length=-12177*

otherwise :

*this.dataLength=32
currentDataOffset=263702
rowId=31999
length=-9238*

the value of (32 - 263702) is exceed the range of short.

I patch the PR#2796(https://github.com/apache/carbondata/pull/2796), but
error still occurred.

finally, my test steps are:

for example: there are 4 level-1 compacted segments: 1.1, 2.1, 3.1, 4.1:
*1. run compact command, it failed;
2. delete 1.1 segment, run compact command again, it failed;
3. delete 2.1 segment, run compact command again, it failed;
3. delete 3.1 segment, run compact command again, it succeeded;*

So I think that one of 8 level-1 compacted segments maybe have some problem
but I don't how to find out.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: carbondata1.5 compatible with spark2.3.1

2018-10-12 Thread xm_zzc
Hi Sujith:
  Currently it can't compile with Spark 2.3.1 after merging PR#2779, because
Spark 2.3.2 has some interface changes, please see 
https://github.com/apache/carbondata/pull/2779
  



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: carbondata1.5 compatible with spark2.3.1

2018-10-12 Thread xm_zzc
Hi:
   CarbonData community decide to support spark 2.3.2 only, because spark
2.3.1 has some critical issues. If you need to integrate spark 2.3.1 ,
please contact me on mailling list and I will told you how to do.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Reduce the size of assembly jar

2018-10-12 Thread xm_zzc
Hi:
  This is not an issue. I found that assembly jar includes
aws-sdk-java-bundle module (1.11.X) which is about 60MB, if you don't need
this module, add 'com.amazonaws:*' in maven-shade-plugin
of assembly pom.xml. 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ISSUE] carbondata1.5.0 and spark 2.3.2 query plan issue

2018-09-30 Thread xm_zzc
Hi Aaron:
  Can you list you create table sql and select sql for us? And is it correct
for spark 2.2?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: CarbonData Performance Optimization

2018-09-27 Thread xm_zzc
So excited. Good optimization.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.5.0(RC1) release

2018-09-26 Thread xm_zzc
Hi:
  Can this release include PR-2733: Upgrade presto integration version to
0.210?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ANNOUNCE] Raghunandan as new committer of Apache CarbonData

2018-09-26 Thread xm_zzc
Congratulations Raghunandan, welcome aboard !!



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Propose to upgrade the version of integration/presto from 0.187 to 0.206

2018-09-18 Thread xm_zzc
Hi:
  I think we can upgrade presto to 0.210 for CarbonData 1.5, 0.210 has fixed
an 'JDBC Driver' issue which is used frequently:
  *Deallocate prepared statement when PreparedStatement is closed.
Previously, Connection became unusable after many prepared statements were
created.*




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Implement file-level Min/Max index for streaming segment

2018-08-26 Thread xm_zzc
+1.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Support Standard Spark's FileFormat interface in Carbondata

2018-08-23 Thread xm_zzc
+1, Good feature.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-08-21 Thread xm_zzc
Hi all:
  Got it, thanks for your suggestions, I will implement this and raise a pr.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-08-19 Thread xm_zzc
Hi dev:
  Now I am working on this, the new format is shown in attachment, please
give me some feedback.
  There is one question: if user uses CTAS to create table, do we need to
show the 'select sql' in the result of 'desc formatted table'? If yes, how
to get 'select sql'? now I just can get a non-formatted sql from
'CarbonSparkSqlParser.scala' (Jacky mentioned), for example:

*CREATE TABLE IF NOT EXISTS test_table
STORED BY 'carbondata'
TBLPROPERTIES(
'streaming'='false', 'sort_columns'='id,city', 'dictionary_include'='name')
AS SELECT * from source_test ;*

The non-formatted sql I get is :
*SELECT*fromsource_test*

desc_formatted.txt

  
desc_formatted_external.txt

  






--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.4.1(RC2) release

2018-08-14 Thread xm_zzc
+1. 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-07-06 Thread xm_zzc
please see:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t1/desc_table_info.txt



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Carbon Local Dictionary Support

2018-06-14 Thread xm_zzc
Hi kumarvishal09:
  Will this feature support on stream table too? 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Use RowStreamParserImp as default value of config 'carbon.stream.parser'

2018-06-06 Thread xm_zzc
Hi dev:
  Currently the default value of 'carbon.stream.parser' is
CSVStreamParserImp, it transforms InternalRow(0) to Array[Object],
InternalRow(0) represents a string value of one line. But generally user
will analyse data and save them as a Row(Int, String, String, Double...),
not a Row(String), I think parser 'RowStreamParserImpl' is used more often
for real scene. 
So do we use 'RowStreamParserImp' as default value? Welcome to give some
feedback.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Carbon Local Dictionary Support

2018-06-05 Thread xm_zzc
Hi:
  +1.
  This is an exciting feature, hope to have it in version 1.5.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Support updating/deleting data for stream table

2018-06-02 Thread xm_zzc
Hi  Raghu:
  Yep, you are right, so I said solution 1 is not very precise when there
are still some data you want to update/delete being stored in stream
segments, solution 2 can handle this scenario you mentioned.
  But, in my opinion, the scenario of deleting historical data is more
common than the one of updating data, the data size of stream table will
grow day by day, user generally want to delete specific data to make data
size not too large, for example, if user want to keep data for one year, he
need to delete one year ago of data everyday. On the other hand, solution 2
is more complicated than solution 1, we need to consider the implement of
solution 2 in depth.
  Based on the above reasons, Liang Chen, Jacky, David and I prefered to
implement Solution 1 first. Is it ok for you?
  
  Is there any other suggestion?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Support updating/deleting data for stream table

2018-05-29 Thread xm_zzc
Hi dev:
  Sometimes we need to delete some historical data from stream table to make
the table size not too large, but currently the stream table can't support
updating/deleting data, so we need to stop the app and use 'alter table
COMPACT 'close_streaming' command to close stream table, and then delete
data. 
  According to discussion with Jacky and David offline, there are two
solutions to resolve this without stopping app:
  
  1. set all non-stream segments to 'carbon.input.segments.tablename'
property to delete data except stream segment, this's easy to implement;
  2. support deleting data for stream segment too, this's more complicated.
  
  I think we can implement with solution 1 first, and then consider the
implementation of solution 2 in depth.
  
  Welcome to feedback, thanks. 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.4.0(RC2) release

2018-05-27 Thread xm_zzc
+1



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] About syntax of compaction on specified segments

2018-03-13 Thread xm_zzc
I prefer to use option 3.  Using the new compact type called 'CUSTOM', which
is different from the minor and major, will not make the user confused.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.3.0(RC2) release

2018-02-05 Thread xm_zzc
+1

The issue I fixed in pr 1928(https://github.com/apache/carbondata/pull/1928)
is *not* a block issue for releasing CarbonData 1.3.0, we can merge this pr
into 1.3.1 which will be released soon after 1.3.0.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.3.0(RC2) release

2018-02-03 Thread xm_zzc
Hi dev:
  there are two errors when compile with jdk 1.7, I have raise a pr to fix
these: https://github.com/apache/carbondata/pull/1928 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Should CarbonData need to integrate with Spark Streaming too?

2018-01-16 Thread xm_zzc
Liang Chen wrote
> Hi
> 
> Thanks for you started this discussion for adding spark streaming support.
> 1. Please try to utilize the current code(structured streaming), not
> adding
> separated logic code for spark streaming. 

[reply] The original idea is to reuse the current code(structured streaming)
to implement integration Spark Streaming.


Liang Chen wrote
> 2. I suggest that by default is using structured streaming , please
> consider
> how to make configuration for enabling/switching to spark streaming.

[reply] The implementations of Structured Streaming and Spark Streaming are
different, the usage of them are different too, I don't understand what dose
'consider 
how to make configuration for enabling/switching to spark streaming' mean?
IMO, we just need to implement a utilities to write rdd data to streaming
segment in DStream.foreachRDD, the logic of this utilities is the same as
CarbonAppendableStreamSink.addBatch. right?


Liang Chen wrote
> Regards
> Liang
> 
> 
> xm_zzc wrote
>> Hi dev:
>>   Currently CarbonData 1.3(will be released soon) just support to
>> integrate
>> with Spark Structured Streaming which requires Kafka's version must be >=
>> 0.10. I think there are still many users  integrating Spark Streaming
>> with
>> kafka 0.8, at least our cluster is, but the cost of upgrading kafka is
>> too
>> much. So should CarbonData need to integrate with Spark Streaming too?
>>   
>>   I think there are two ways to integrate with Spark Streaming, as
>> following:
>>   1). CarbonData batch data loading + Auto compaction
>>   Use CarbonSession.createDataFrame to convert rdd to DataFrame in
>> InputDStream.foreachRDD, and then save rdd data into CarbonData table
>> which
>> support auto compaction. In this way, it can support to create
>> pre-aggregate
>> tables on this main table too (Streaming table does not support to create
>> pre-aggregate tables on it).
>>   
>>   I can test with this way in our QA env and add example to CarbonData.
>>   
>>   2). The same as integration with Structured Streaming
>>   With this way, Structured Streaming append every mini-batch data into
>> stream segment which is row format, and then when the size of stream
>> segment
>> is greater than 'carbon.streaming.segment.max.size', it will auto convert
>> stream segment to batch segment(column format) at the begin of each batch
>> and create a new stream segment to append data.
>>   However, I have no idea how to integrate with Spark Streaming yet, *any
>> suggestion for this*? 
>> 
>> 
>> 
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 
> 
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Should CarbonData need to integrate with Spark Streaming too?

2018-01-16 Thread xm_zzc
Hi Jacky:
>>  1). CarbonData batch data loading + Auto compaction 
>>  Use CarbonSession.createDataFrame to convert rdd to DataFrame in 
>> InputDStream.foreachRDD, and then save rdd data into CarbonData table
>> which 
>> support auto compaction. In this way, it can support to create
>> pre-aggregate 
>> tables on this main table too (Streaming table does not support to create 
>> pre-aggregate tables on it). 
>> 
>>  I can test with this way in our QA env and add example to CarbonData.
>
>This approach is doable, but the loading interval should be relative longer
since it still uses columnar file in >this approach. I am not sure how
frequent you do one batch load? 

Agree. the loading interval should be relative longer, maybe 15s, 30s, even
1min, but it is also related to the data size of every mini-batch.

>>  2). The same as integration with Structured Streaming 
>>  With this way, Structured Streaming append every mini-batch data into 
>> stream segment which is row format, and then when the size of stream
>> segment 
>> is greater than 'carbon.streaming.segment.max.size', it will auto convert 
>> stream segment to batch segment(column format) at the begin of each batch 
>> and create a new stream segment to append data. 
>>  However, I have no idea how to integrate with Spark Streaming yet, *any 
>> suggestion for this*? 
>>
>
>You can refer to the logic in CarbonAppendableStreamSink.addBatch,
basically it launches a job to do >appending to row format files in the
streaming segment by invoking >CarbonAppendableStreamSink.writeDataFileJob.
At beginning, you can invoke checkOrHandOffSegment >to create the streaming
segment. 
>I think integrate with the SparkStreaming is a good feature to have, it
enables more user to use carbon >streaming ingest feature on existing
cluster setting with old spark and Kafka version. 
>Please feel free to create JIRA ticket and discuss in the community. 

OK, I have read the code of streaming module , and discussed with David
offline, I will implement this feature ASAP.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Should CarbonData need to integrate with Spark Streaming too?

2018-01-16 Thread xm_zzc
Hi dev:
  Currently CarbonData 1.3(will be released soon) just support to integrate
with Spark Structured Streaming which requires Kafka's version must be >=
0.10. I think there are still many users  integrating Spark Streaming with
kafka 0.8, at least our cluster is, but the cost of upgrading kafka is too
much. So should CarbonData need to integrate with Spark Streaming too?
  
  I think there are two ways to integrate with Spark Streaming, as
following:
  1). CarbonData batch data loading + Auto compaction
  Use CarbonSession.createDataFrame to convert rdd to DataFrame in
InputDStream.foreachRDD, and then save rdd data into CarbonData table which
support auto compaction. In this way, it can support to create pre-aggregate
tables on this main table too (Streaming table does not support to create
pre-aggregate tables on it).
  
  I can test with this way in our QA env and add example to CarbonData.
  
  2). The same as integration with Structured Streaming
  With this way, Structured Streaming append every mini-batch data into
stream segment which is row format, and then when the size of stream segment
is greater than 'carbon.streaming.segment.max.size', it will auto convert
stream segment to batch segment(column format) at the begin of each batch
and create a new stream segment to append data.
  However, I have no idea how to integrate with Spark Streaming yet, *any
suggestion for this*? 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ANNOUNCE] Kumar Vishal as new PMC for Apache CarbonData

2018-01-10 Thread xm_zzc
 Congratulations Vishal !



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ANNOUNCE] David Cai as new PMC for Apache CarbonData

2018-01-10 Thread xm_zzc
 Congratulations David !



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [VOTE] Apache CarbonData 1.3.0(RC1) release

2018-01-09 Thread xm_zzc
Hi ravipesala:
  I find that there are some unresolved jira bugs related to version 1.3:
 
https://issues.apache.org/jira/browse/CARBONDATA-2001?jql=project%20%3D%20CARBONDATA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20affectedVersion%20%3D%201.3.0%20AND%20fixVersion%20%3D%201.3.0%20ORDER%20BY%20updated%20DESC

  Do not solve these bugs before releasing branch-1.3?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Should we use Spark 2.2.1 as default version for Spark-2.2 supported

2018-01-02 Thread xm_zzc
Thanks for your replies, Jacky, Liang, Raghunandan S and sounak.
I will raise a jira task and pr to upgrade spark version to 2.2.1.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Should we use Spark 2.2.1 as default version for Spark-2.2 supported

2018-01-01 Thread xm_zzc
Hi dev:
  Any suggestions about this?  @ravipesala  @David CaiQiang or others.




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion]support table level compaction configuration

2017-11-12 Thread xm_zzc
It's a good suggestion. Do you start to working on this?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [PROPOSAL] Tag Pull Request with feature tag

2017-10-29 Thread xm_zzc
+1, we can add a new feature tag called "Spark2.2", OK?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Optimize the default value for some parameters

2017-10-26 Thread xm_zzc
Hi ravipesala:
  Ok, I will raise jira for this and try to implement this.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSSION] Optimize the default value for some parameters

2017-10-25 Thread xm_zzc
Hi:
  If we are using carbondata + spark to load data, we can set
carbon.number.of.cores.while.loading to the  number of executor cores.

  When set the number of executor cores to 6, it shows that there are at
least 6 cores per node for loading data, so we can set
carbon.number.of.cores.while.loading to 6 automatically.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion]support user specified segments in major compation

2017-10-23 Thread xm_zzc
+1, sounds good about this feature.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: 回复:[DISCUSSION] Support only spark 2 in carbon 1.3.0

2017-10-10 Thread xm_zzc
+1



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Add an option such as 'carbon.update.storage.level' to configurate the storage level when updating data with 'carbon.update.persist.enable'='true'

2017-09-07 Thread xm_zzc
ok, I will raise a pr to implement this.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Add an option such as 'carbon.update.storage.level' to configurate the storage level when updating data with 'carbon.update.persist.enable'='true'

2017-09-04 Thread xm_zzc
Hi all, any suggestion about this?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Add an option such as 'carbon.update.storage.level' to configurate the storage level when updating data with 'carbon.update.persist.enable'='true'

2017-09-03 Thread xm_zzc
Hi all:
  when updating data, if 'carbon.update.persist.enable'='true', the dataset
in the method 'ProjectForUpdateCommand.processData' will be persisted, but
now it's storage level is MEMORY_AND_DISK, should it be set by user or not?
  if yes, I can implement it, and add an option such as
'carbon.update.storage.level' to configurate. 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Add an option such as 'carbon.update.storage.level' to configurate the storage level when updating data with 'carbon.update.persist.enable'='true'

2017-09-03 Thread xm_zzc
Hi all:
  when updating data, if 'carbon.update.persist.enable'='true', the dataset
in the method 'ProjectForUpdateCommand.processData' will be persisted, but
now it's storage level is MEMORY_AND_DISK, should it be set by user or not?
  if yes, I can implement it, and add an option such as
'carbon.update.storage.level' to configurate. 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [ANNOUNCE] Manish Gupta as new Apache CarbonData

2017-08-27 Thread xm_zzc
Congratulations Manish!!!



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ANNOUNCE-Manish-Gupta-as-new-Apache-CarbonData-tp20750p20804.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:

2017-07-28 Thread xm_zzc
Hi guys:
  I run CarbonData(branch master ) + Spark 2.1.1 with on yarn-client mode,
there is en error when i execute select sql, the details are as follows:

  My env:  CarbonData(branch master, 2456 commits) + Spark 2.1.1, run on
yarn-client mode;

  spark shell:  */opt/spark2/bin/spark-shell --master yarn --deploy-mode
client --files
/opt/spark2/conf/log4j_all.properties#log4j.properties,/opt/spark2/conf/carbon.properties
--driver-memory 6g --num-executors 6 --executor-memory 5g --executor-cores 1
--driver-library-path :/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
--jars
/opt/spark2/carbonlib/carbondata_2.11-1.2.0-shade-hadoop2.6.0-cdh5.7.1.jar*;

  carbon.properties:
  *  carbon.storelocation=hdfs://hdtcluster/carbon_store
  carbon.ddl.base.hdfs.url=hdfs://hdtcluster/carbon_base_path
  carbon.bad.records.action=FORCE
  carbon.badRecords.location=/opt/carbondata/badrecords
  
  carbon.use.local.dir=true
  carbon.use.multiple.temp.dir=true
  
  carbon.sort.file.buffer.size=20
  carbon.graph.rowset.size=10
  carbon.number.of.cores.while.loading=6
  carbon.sort.size=50
  carbon.enableXXHash=true
  
  carbon.number.of.cores.while.compacting=2
  carbon.compaction.level.threshold=2,4
  carbon.major.compaction.size=1024
  carbon.enable.auto.load.merge=true
  
  carbon.number.of.cores=4
  carbon.inmemory.record.size=12
  carbon.enable.quick.filter=false
  
  carbon.timestamp.format=-MM-dd HH:mm:ss
  carbon.date.format=-MM-dd
  
  carbon.lock.type=HDFSLOCK
  
  enable.unsafe.columnpage=true*

  my code:
  *  import org.apache.spark.sql.SaveMode
  import org.apache.carbondata.core.util.CarbonProperties
  import org.apache.carbondata.core.constants.CarbonCommonConstants
  import org.apache.spark.sql.SparkSession
  import org.apache.spark.sql.CarbonSession._
  
  sc.setLogLevel("DEBUG")
  val carbon =
SparkSession.builder().appName("TestCarbonData").config(sc.getConf)
   .getOrCreateCarbonSession("hdfs://hdtcluster/carbon_store",
"/opt/carbondata/carbon.metastore")
  
  carbon.conf.set("spark.sql.parquet.binaryAsString", true)
  val testParquet = carbon.read.parquet("/tmp/cp_hundred_million")
  
  testParquet.createOrReplaceTempView("test_distinct")
  val orderedCols = carbon.sql("""
select chan, acarea, cache, code, rt, ts, fcip, url, size, host,
bsize, upsize, fvarf, fratio, 
   ua, uabro, uabrov, uaos, uaptfm, uadvc, msecdl, refer, pdate,
ptime, ftype 
from test_distinct
""")
  
  println(orderedCols.count())
  
  carbon.sql("""
  |  CREATE TABLE IF NOT EXISTS carbondata_hundred_million_pr1198 (
  |chan  string,
  |acareastring, 
  |cache string, 
  |code  int, 
  |rtstring, 
  |tsint, 
  |fcip  string, 
  |url   string, 
  |size  bigint, 
  |host  string, 
  |bsize bigint, 
  |upsizebigint, 
  |fvarf string, 
  |fratioint, 
  |uastring, 
  |uabro string, 
  |uabrovstring, 
  |uaos  string, 
  |uaptfmstring, 
  |uadvc string, 
  |msecdlbigint, 
  |refer string, 
  |pdate string, 
  |ptime string, 
  |ftype string 
  |  )
  |  STORED BY 'carbondata'
  |  TBLPROPERTIES('DICTIONARY_INCLUDE'='chan, acarea, cache, rt,
ts, fcip, ua, uabro, uabrov, uaos, uaptfm, uadvc, refer, ftype',
  |'NO_INVERTED_INDEX'='pdate, ptime',
  |'TABLE_BLOCKSIZE'='512'
  |  )
 """.stripMargin)
  carbon.catalog.listDatabases.show(false)
  carbon.catalog.listTables.show(false)  
  orderedCols.write
.format("carbondata")
.option("tableName", "carbondata_hundred_million_pr1198")
.option("tempCSV", "false")
.option("compress", "true")
.option("single_pass", "true") 
.mode(SaveMode.Append)
.save()
  carbon.sql("""
select count(1) from
default.carbondata_hundred_million_pr1198
""").show(100)
  carbon.sql("""
SHOW SEGMENTS FOR TABLE
default.carbondata_hundred_million_pr1198 limit 100
""").show*

  data loading is successful, but when execute select sql, an error
occurred:
  *org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
tree:  
Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)],
output=[count#253L])
   +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default,
Table name :carbondata_hundred_million_pr1198, Schema
:Some(StructType(StructField(chan,StringType,true),
StructField(acarea,StringType,true), 

Re: [DISCUSSION] In 1.2.0, use Spark 2.1 and Hadoop 2.7.2 as default compilation in pom.

2017-06-15 Thread xm_zzc
+1, using Spark 2.1 and Hadoop 2.7.2 as default compilation.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-In-1-2-0-use-Spark-2-1-and-Hadoop-2-7-2-as-default-compilation-in-pom-tp15278p15280.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


[DISCUSSION] Whether Carbondata should support Spark-2.2 in the next release version(1.2.0)

2017-06-09 Thread xm_zzc
Hi guys:
  Spark-2.2 will soon be released, whether Carbondata should support
Spark-2.2 in the next release version(1.2.0), or just support Spark-2.1.1.
Please give your comments also for the discussion.

  Thanks.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Whether-Carbondata-should-support-Spark-2-2-in-the-next-release-version-1-2-0-tp14332.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: Program didn't stop after loading successfully

2017-06-08 Thread xm_zzc
Hi, ravipesala:
  I am using carbondata master branch , this issue still occurs sometimes,
but not every time. I am sure that master branch is the latest  and pr 933
has been merged.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Program-didn-t-stop-after-loading-successfully-tp12001p14216.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: Program didn't stop after loading successfully

2017-06-08 Thread xm_zzc
Hi, ravipesala:
  any progress on this issue? I still often encounter this problem.



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Program-didn-t-stop-after-loading-successfully-tp12001p14208.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Program didn't stop after loading successfully

2017-05-03 Thread xm_zzc
Hi all:
  I used spark to read a parquet file and than save it as carbondata file,
but I found that *my program didn't stop* after loading data successfully, I
saw log output as following: 
  
  *main -[MyUbuntu-64][myubuntu][Thread-1]Total time taken to write
dictionary file is: 40728
  main -[MyUbuntu-64][myubuntu][Thread-1]Data load is successful for
default.parquet_to_carbondata2
  main -main compaction need status is false
  main -main Successfully deleted the lock file
/data/carbon_data/default/parquet_to_carbondata2/meta.lock*

  According to the log above, I think that the data load was successful, but
why the programe hanged?

  My program is :

/val warehouseLocation = Constants.SPARK_WAREHOUSE
val storeLocation = Constants.CARBON_FILES

CarbonProperties.getInstance()
  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
"-MM-dd HH:mm:ss")
  .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "-MM-dd")

import org.apache.spark.sql.CarbonSession._
val spark = SparkSession
  .builder()
  .appName("TestCarbonData")
  .master("local[2]")
  .config("spark.sql.warehouse.dir", warehouseLocation)
  .getOrCreateCarbonSession(storeLocation, Constants.METASTORE_DB)
  
spark.conf.set("spark.sql.parquet.binaryAsString", true)
val testParquet2 = spark.read.parquet("file:///data/cbd_test.parquet")

spark.sql("""DROP TABLE IF EXISTS parquet_to_carbondata2""")
  
testParquet2.show()
testParquet2.printSchema()
println("testParquet2: " + testParquet2.count())

testParquet2.write
  .format("carbondata")
  .option("tableName", "parquet_to_carbondata2")
  .option("tempCSV", "false")
  .option("compress", "true")
  .option("single_pass", "true")
  .option("partitionCount", "2")
  .option("table_blocksize", "8")
  .option("dictionary_exclude", "ftype,chan") 
  .mode(SaveMode.Overwrite)
  .save()
  
spark.stop()/

*I use command 'jstack' to trace this program, the messages are as
following:*

   "Attach Listener" #438 daemon prio=9 os_prio=0 tid=0x7f91e80a3800
nid=0x1c44 waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"DestroyJavaVM" #435 prio=5 os_prio=0 tid=0x7f9240015000 nid=0x191e
waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"nioEventLoopGroup-11-1" #96 prio=10 os_prio=0 tid=0x7f920009d000
nid=0x19a3 runnable [0x7f9160df9000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00067a853b30> (a
io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00065fa85800> (a java.util.Collections$UnmodifiableSet)
- locked <0x0006602e3ab8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:746)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391)
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None

"threadDeathWatcher-9-1" #91 daemon prio=1 os_prio=0 tid=0x7f919c0d4000
nid=0x199e sleeping[0x7f9168389000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
io.netty.util.ThreadDeathWatcher$Watcher.run(ThreadDeathWatcher.java:152)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None

"ForkJoinPool-2-worker-11" #65 daemon prio=5 os_prio=0
tid=0x7f919c001800 nid=0x1979 waiting on condition [0x7f916b3ef000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00067aa126b8> (a
scala.concurrent.forkjoin.ForkJoinPool)
at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

   Locked ownable synchronizers:
- None

"BoneCP-pool-watch-thread" #54 daemon prio=5 os_prio=0
tid=0x7f92427d2800 nid=0x1966 waiting on condition [0x7f91bc738000]