[jira] [Created] (CARBONDATA-855) Can't update successfully.
sehriff created CARBONDATA-855: -- Summary: Can't update successfully. Key: CARBONDATA-855 URL: https://issues.apache.org/jira/browse/CARBONDATA-855 Project: CarbonData Issue Type: Bug Environment: spark1.6.0 carbon1.0.0 Reporter: sehriff Attachments: metadataupdate.txt, updatefail1.txt can't update carbondata table neither using cc.sql("update...).show nor hive table under hive shell(update hive table set ... where...).Most part of executing cc.sql logs【attachment updatefail1.txt】 is INFO,whick seems normal,but actually it doesn't update successfully,values should be updated remains unchanged. Because carboncontext extends from hivecontext,so I was wondering if I should change hive configurations to make updating hive table under hive shell working then i can try update carbondata table successfully. Also,I didn't see any updatedelta files in hdfs but tableupdatestatus files under metadata directory【attachment metadataupdate.txt】,maybe there's something configurations should be configured in hdfs? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-854) Carbondata with Datastax / Cassandra
Sanoj MG created CARBONDATA-854: --- Summary: Carbondata with Datastax / Cassandra Key: CARBONDATA-854 URL: https://issues.apache.org/jira/browse/CARBONDATA-854 Project: CarbonData Issue Type: Improvement Components: spark-integration Affects Versions: 1.1.0-incubating Environment: Datastax DSE 5.0 ( DSE analytics ) Reporter: Sanoj MG Priority: Minor Fix For: 1.1.0-incubating I am trying to get Carbondata working in a Datastax DSE 5.0 cluster. An exception is thrown while trying to create Carbondata table from spark shell. Below are the steps: scala> import com.datastax.spark.connector._ scala> import org.apache.spark.sql.SaveMode scala> import org.apache.spark.sql.CarbonContext scala> import org.apache.spark.sql.types._ scala> val cc = new CarbonContext(sc, "cfs://127.0.0.1/opt/CarbonStore") scala> val df = cc.read.parquet("file:///home/cassandra/testdata-30day/cassandra/zone.parquet") scala> df.write.format("carbondata").option("tableName", "zone").option("compress", "true").option("TempCSV","false").mode(SaveMode.Overwrite).save() Below exception is thrown and it fails to create carbondata table. java.io.FileNotFoundException: /opt/CarbonStore/default/zone/Metadata/schema (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:133) at org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:207) at org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:84) at org.apache.spark.sql.hive.CarbonMetastore.createTableFromThrift(CarbonMetastore.scala:293) at org.apache.spark.sql.execution.command.CreateTable.run(carbonTableSchema.scala:163) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) at org.apache.carbondata.spark.CarbonDataFrameWriter.saveAsCarbonFile(CarbonDataFrameWriter.scala:39) at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:109) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-852) Less than or equal to operator(<=) does not work properly in Range Filter.
Vinod Rohilla created CARBONDATA-852: Summary: Less than or equal to operator(<=) does not work properly in Range Filter. Key: CARBONDATA-852 URL: https://issues.apache.org/jira/browse/CARBONDATA-852 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.1.0-incubating Environment: Spark 2-1 Reporter: Vinod Rohilla Priority: Minor Less than or equal (<=) to operator does not work properly in range filter. Steps to reproduces: 1)Create table: CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB"); 2)Load Data in a table: LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); 3: Run the Query. select dob from uniqdata where dob <= '1972-12-10' and dob >= '1972-12-01'; 4:Result on beeline: ++--+ | dob | ++--+ | 1972-12-01 01:00:03.0 | | 1972-12-02 01:00:03.0 | | 1972-12-03 01:00:03.0 | | 1972-12-04 01:00:03.0 | | 1972-12-05 01:00:03.0 | | 1972-12-06 01:00:03.0 | | 1972-12-07 01:00:03.0 | | 1972-12-08 01:00:03.0 | | 1972-12-09 01:00:03.0 | ++--+ Expected Result: It should include " 1972-12-10 " in the result set. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Carbondata with Datastax / Cassandra
Hi All, We have a Datastax/cassandra cluster and I am trying to see if I can get Carbondata working there. Below are the steps that I tried in spark shell. scala> import com.datastax.spark.connector._ scala> import org.apache.spark.sql.SaveMode scala> import org.apache.spark.sql.CarbonContext scala> import org.apache.spark.sql.types._ scala> val cc = new CarbonContext(sc, "cfs://127.0.0.1/opt/CarbonStore") scala> val df = cc.read.parquet("file:///home/cassandra/testdata-30day/cassandra/zone.parquet") scala> df.write.format("carbondata").option("tableName", "zone").option("compress", "true").option("TempCSV","false").mode(SaveMode.Overwrite).save() Below exception is thrown and it fails to create carbondata table. Full stack trace is attached. Appreciate if someone can give any pointers on where to look. == java.io.FileNotFoundException: /opt/CarbonStore/default/zone/Metadata/schema (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:133) at org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:207) at org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:84) at org.apache.spark.sql.hive.CarbonMetastore.createTableFromThrift(CarbonMetastore.scala:293) at org.apache.spark.sql.execution.command.CreateTable.run(carbonTableSchema.scala:163) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139) at org.apache.carbondata.spark.CarbonDataFrameWriter.saveAsCarbonFile(CarbonDataFrameWriter.scala:39) at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:109) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) == Thanks, Sanoj cassandra@sanoj-OptiPlex-990:~/single-carbon/dse-5.0.4$ ./bin/dse spark SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/cassandra/single-carbon/dse-5.0.4/lib/carbondata_2.10-1.1.0-incubating-SNAPSHOT-shade-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/cassandra/single-carbon/dse-5.0.4/resources/cassandra/lib/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/04/04 16:17:31 INFO deploy.DseSparkSubmitBootstrapper: DSE Spark 17/04/04 16:17:32 WARN core.NettyUtil: Found Netty's native epoll transport in the classpath, but epoll is not available. Using NIO instead. 17/04/04 16:17:33 INFO core.Cluster: New Cassandra host /127.0.0.1:9042 added 17/04/04 16:17:33 INFO cql.CassandraConnector: Connected to Cassandra cluster: Test Cluster 17/04/04 16:17:33 INFO deploy.SparkNodeConfiguration: Trying to setup a server socket at /10.33.31.29:34923 to verify connectivity with DSE node... 17/04/04 16:17:33 INFO deploy.SparkNodeConfiguration: Successfully verified DSE Node -> this application connectivity on random port (34923) 17/04/04 16:17:33 INFO deploy.DseSparkSubmitBootstrapper: Starting Spark driver using SparkSubmit Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.2 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111) Type in expressions to have them evaluated. Type :help for more information. Initializing SparkContext with MASTER: spark://127.0.0.1:7077 17/04/04 16:17:36 INFO spark.SparkContext: Running Spark version 1.6.2 17/04/04 16:17:36 INFO
Re: Dimension column of integer type - to exclude from dictionary
Hi Liang, On Tue, Apr 4, 2017 at 2:55 PM, Liang Chenwrote: > Hi Sanoj > > First , see if i understand your requirement: you only want to build index > for column "Account", but don't want to build dictionary for column > "Account", is it right? > Yes this is right. In our ETL pipeline we have many dimension columns / surrogate keys of integer type. I want to build index for these columns, will try as David suggested. > If the above my understanding is right, then David mentioned > "SORT_COLUMNS" > feature will satisfy your requirements. > > Currently, you only can do like this : > First changes column "Account" to String type from Integer, then uses > TBLPROPERTIES ('DICTIONARY_EXCLUDE'='Account') > I thought of doing this, but don't really like it since I will have to pad 0's for comparison operators to work. Also, will have to cast it back if I need to load it into another system. Another point, in our start schema, there are many low cardinality surrogate keys of int type as well. These are indeed dimension columns that need index, but dictionary encoding may not give any benefit. Thanks, Sanoj > Regards > Liang > > > Sanoj MG wrote > > Hi All, > > > > I have a dimension column of integer type. Since the cardinality of this > > column is relatively high, I want to exclude it from the dictionary for > > faster loading. Is there any way to do this in Carbondata DDL? > > > > When I use TBLPROPERTIES ('DICTIONARY_INCLUDE'='Account'), Account will > be > > defined as a dimension, but it will also be included in the dictionary. > > > > > > Thanks, > > Sanoj > > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Dimension- > column-of-integer-type-to-exclude-from-dictionary-tp9961p10008.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. >
Re: Dimension column of integer type - to exclude from dictionary
Hi Sanoj First , see if i understand your requirement: you only want to build index for column "Account", but don't want to build dictionary for column "Account", is it right? If the above my understanding is right, then David mentioned "SORT_COLUMNS" feature will satisfy your requirements. Currently, you only can do like this : First changes column "Account" to String type from Integer, then uses TBLPROPERTIES ('DICTIONARY_EXCLUDE'='Account') Regards Liang Sanoj MG wrote > Hi All, > > I have a dimension column of integer type. Since the cardinality of this > column is relatively high, I want to exclude it from the dictionary for > faster loading. Is there any way to do this in Carbondata DDL? > > When I use TBLPROPERTIES ('DICTIONARY_INCLUDE'='Account'), Account will be > defined as a dimension, but it will also be included in the dictionary. > > > Thanks, > Sanoj -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dimension-column-of-integer-type-to-exclude-from-dictionary-tp9961p10008.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-850) Fix the comment definition issues of CarbonData thrift files
Liang Chen created CARBONDATA-850: - Summary: Fix the comment definition issues of CarbonData thrift files Key: CARBONDATA-850 URL: https://issues.apache.org/jira/browse/CARBONDATA-850 Project: CarbonData Issue Type: Bug Components: file-format Reporter: Liang Chen Assignee: Liang Chen Priority: Minor Fix For: 1.1.0-incubating Fix the comment definition issues of CarbonData thrift files, for help users to easier understand CarbonData file format -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-849) if alter table ddl is executed on non existing table, then error message is wrong.
ravikiran created CARBONDATA-849: Summary: if alter table ddl is executed on non existing table, then error message is wrong. Key: CARBONDATA-849 URL: https://issues.apache.org/jira/browse/CARBONDATA-849 Project: CarbonData Issue Type: Bug Components: sql Reporter: ravikiran Assignee: ravikiran Priority: Minor The error message getting while running alter on the non existing table is : Exception in thread "main" org.apache.carbondata.spark.exception.MalformedCarbonCommandException: Unsupported alter operation on hive table but this is not correct. The hive table has blocked the alter DDL on its tables. So Carbon should be consistent with HIVE. Correct msg : Operation not allowed: alter table name compact 'minor' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-848) Select count(*) from table gives an exception in Presto
Bhavya Aggarwal created CARBONDATA-848: -- Summary: Select count(*) from table gives an exception in Presto Key: CARBONDATA-848 URL: https://issues.apache.org/jira/browse/CARBONDATA-848 Project: CarbonData Issue Type: Bug Components: presto-integration Reporter: Bhavya Aggarwal Assignee: Bhavya Aggarwal The select count(*) is giving an ArrayIndexOutOfException in Presto connector. -- This message was sent by Atlassian JIRA (v6.3.15#6346)