[jira] [Created] (CARBONDATA-855) Can't update successfully.

2017-04-04 Thread sehriff (JIRA)
sehriff created CARBONDATA-855:
--

 Summary: Can't update successfully.
 Key: CARBONDATA-855
 URL: https://issues.apache.org/jira/browse/CARBONDATA-855
 Project: CarbonData
  Issue Type: Bug
 Environment: spark1.6.0  carbon1.0.0
Reporter: sehriff
 Attachments: metadataupdate.txt, updatefail1.txt

can't update carbondata table neither using cc.sql("update...).show nor hive 
table under hive shell(update hive table set ... where...).Most part of 
executing cc.sql logs【attachment updatefail1.txt】 is INFO,whick seems 
normal,but actually it doesn't update successfully,values should be updated 
remains unchanged.
Because carboncontext extends from hivecontext,so I was wondering if I should 
change hive configurations to make updating  hive table under hive shell 
working then i can try update carbondata table successfully.
Also,I didn't see any updatedelta files in hdfs but tableupdatestatus files 
under metadata directory【attachment metadataupdate.txt】,maybe there's something 
configurations should be configured in hdfs?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-854) Carbondata with Datastax / Cassandra

2017-04-04 Thread Sanoj MG (JIRA)
Sanoj MG created CARBONDATA-854:
---

 Summary: Carbondata with Datastax / Cassandra
 Key: CARBONDATA-854
 URL: https://issues.apache.org/jira/browse/CARBONDATA-854
 Project: CarbonData
  Issue Type: Improvement
  Components: spark-integration
Affects Versions: 1.1.0-incubating
 Environment: Datastax DSE 5.0 ( DSE analytics )
Reporter: Sanoj MG
Priority: Minor
 Fix For: 1.1.0-incubating


I am trying to get Carbondata working in a Datastax DSE 5.0 cluster. 

An exception is thrown while trying to create Carbondata table from spark 
shell. Below are the steps: 

scala> import com.datastax.spark.connector._
scala> import org.apache.spark.sql.SaveMode
scala> import org.apache.spark.sql.CarbonContext
scala> import org.apache.spark.sql.types._

scala> val cc = new CarbonContext(sc, "cfs://127.0.0.1/opt/CarbonStore")

scala> val df = 
cc.read.parquet("file:///home/cassandra/testdata-30day/cassandra/zone.parquet")

scala> df.write.format("carbondata").option("tableName", 
"zone").option("compress", 
"true").option("TempCSV","false").mode(SaveMode.Overwrite).save()

Below exception is thrown and it fails to create carbondata table. 

java.io.FileNotFoundException: /opt/CarbonStore/default/zone/Metadata/schema 
(No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:133)
at 
org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:207)
at 
org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:84)
at 
org.apache.spark.sql.hive.CarbonMetastore.createTableFromThrift(CarbonMetastore.scala:293)
at 
org.apache.spark.sql.execution.command.CreateTable.run(carbonTableSchema.scala:163)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
at 
org.apache.carbondata.spark.CarbonDataFrameWriter.saveAsCarbonFile(CarbonDataFrameWriter.scala:39)
at 
org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:109)
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-852) Less than or equal to operator(<=) does not work properly in Range Filter.

2017-04-04 Thread Vinod Rohilla (JIRA)
Vinod Rohilla created CARBONDATA-852:


 Summary: Less than or equal to operator(<=) does not work properly 
in Range Filter.
 Key: CARBONDATA-852
 URL: https://issues.apache.org/jira/browse/CARBONDATA-852
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.1.0-incubating
 Environment: Spark 2-1
Reporter: Vinod Rohilla
Priority: Minor


Less than or equal (<=) to operator does not work properly in range filter.

Steps to reproduces:

1)Create table:
CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= 
"256 MB");

2)Load Data in a table:
LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/uniqdata/2000_UniqData.csv' into 
table uniqdata OPTIONS('DELIMITER'=',' , 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

3: Run the Query.
select dob from uniqdata where dob <= '1972-12-10' and dob >= '1972-12-01';

4:Result on beeline:
++--+
|  dob   |
++--+
| 1972-12-01 01:00:03.0  |
| 1972-12-02 01:00:03.0  |
| 1972-12-03 01:00:03.0  |
| 1972-12-04 01:00:03.0  |
| 1972-12-05 01:00:03.0  |
| 1972-12-06 01:00:03.0  |
| 1972-12-07 01:00:03.0  |
| 1972-12-08 01:00:03.0  |
| 1972-12-09 01:00:03.0  |
++--+

Expected Result: It should include " 1972-12-10 " in the result set.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Carbondata with Datastax / Cassandra

2017-04-04 Thread Sanoj MG
Hi All,

We have a Datastax/cassandra cluster and I am trying to see if I can get
Carbondata working there.

Below are the steps that I tried in spark shell.


scala> import com.datastax.spark.connector._
scala> import org.apache.spark.sql.SaveMode
scala> import org.apache.spark.sql.CarbonContext
scala> import org.apache.spark.sql.types._

scala> val cc = new CarbonContext(sc, "cfs://127.0.0.1/opt/CarbonStore")

scala> val df =
cc.read.parquet("file:///home/cassandra/testdata-30day/cassandra/zone.parquet")

scala> df.write.format("carbondata").option("tableName",
"zone").option("compress",
"true").option("TempCSV","false").mode(SaveMode.Overwrite).save()

Below exception is thrown and it fails to create carbondata table.

Full stack trace is attached. Appreciate if someone can give any pointers
on where to look.

==

java.io.FileNotFoundException:
/opt/CarbonStore/default/zone/Metadata/schema (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:133)
at
org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:207)
at
org.apache.carbondata.core.writer.ThriftWriter.open(ThriftWriter.java:84)
at
org.apache.spark.sql.hive.CarbonMetastore.createTableFromThrift(CarbonMetastore.scala:293)
at
org.apache.spark.sql.execution.command.CreateTable.run(carbonTableSchema.scala:163)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
at
org.apache.carbondata.spark.CarbonDataFrameWriter.saveAsCarbonFile(CarbonDataFrameWriter.scala:39)
at
org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:109)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)


==


Thanks,

Sanoj

cassandra@sanoj-OptiPlex-990:~/single-carbon/dse-5.0.4$ ./bin/dse spark
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/cassandra/single-carbon/dse-5.0.4/lib/carbondata_2.10-1.1.0-incubating-SNAPSHOT-shade-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/cassandra/single-carbon/dse-5.0.4/resources/cassandra/lib/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/04/04 16:17:31 INFO deploy.DseSparkSubmitBootstrapper: DSE Spark
17/04/04 16:17:32 WARN core.NettyUtil: Found Netty's native epoll transport in 
the classpath, but epoll is not available. Using NIO instead.
17/04/04 16:17:33 INFO core.Cluster: New Cassandra host /127.0.0.1:9042 added
17/04/04 16:17:33 INFO cql.CassandraConnector: Connected to Cassandra cluster: 
Test Cluster
17/04/04 16:17:33 INFO deploy.SparkNodeConfiguration: Trying to setup a server 
socket at /10.33.31.29:34923 to verify connectivity with DSE node...
17/04/04 16:17:33 INFO deploy.SparkNodeConfiguration: Successfully verified DSE 
Node -> this application connectivity on random port (34923)
17/04/04 16:17:33 INFO deploy.DseSparkSubmitBootstrapper: Starting Spark driver 
using SparkSubmit
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.2
  /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
Initializing SparkContext with MASTER: spark://127.0.0.1:7077
17/04/04 16:17:36 INFO spark.SparkContext: Running Spark version 1.6.2
17/04/04 16:17:36 INFO 

Re: Dimension column of integer type - to exclude from dictionary

2017-04-04 Thread Sanoj MG
Hi Liang,

On Tue, Apr 4, 2017 at 2:55 PM, Liang Chen  wrote:

> Hi Sanoj
>
> First , see if i understand your requirement: you only want to build index
> for column "Account", but don't want to build dictionary for column
> "Account", is it right?
>

Yes this is right. In our ETL pipeline we have many dimension columns /
surrogate keys of integer type. I want to build index for these columns,
will try as David suggested.


> If the above my understanding is right,  then David mentioned
> "SORT_COLUMNS"
> feature will satisfy your requirements.
>
> Currently, you only can do like this :
> First changes column "Account" to String type from Integer, then uses
> TBLPROPERTIES ('DICTIONARY_EXCLUDE'='Account')
>

I thought of doing this, but don't really like it since I will have to pad
0's for comparison operators to work. Also, will have to cast it back if I
need to load it into another system.

Another point, in our start schema, there are many low cardinality
surrogate keys of int type as well. These are indeed dimension columns that
need index, but dictionary encoding may not give any benefit.

Thanks,
Sanoj



> Regards
> Liang
>
>
> Sanoj MG wrote
> > Hi All,
> >
> > I have a dimension column of integer type. Since the cardinality of this
> > column is relatively high, I want to exclude it from the dictionary for
> > faster loading. Is there any way to do this in Carbondata DDL?
> >
> > When I use TBLPROPERTIES ('DICTIONARY_INCLUDE'='Account'), Account will
> be
> > defined as a dimension, but it will also be included in the dictionary.
> >
> >
> > Thanks,
> > Sanoj
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Dimension-
> column-of-integer-type-to-exclude-from-dictionary-tp9961p10008.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>


Re: Dimension column of integer type - to exclude from dictionary

2017-04-04 Thread Liang Chen
Hi Sanoj

First , see if i understand your requirement: you only want to build index
for column "Account", but don't want to build dictionary for column
"Account", is it right?
If the above my understanding is right,  then David mentioned "SORT_COLUMNS"
feature will satisfy your requirements.

Currently, you only can do like this :
First changes column "Account" to String type from Integer, then uses
TBLPROPERTIES ('DICTIONARY_EXCLUDE'='Account')

Regards
Liang


Sanoj MG wrote
> Hi All,
> 
> I have a dimension column of integer type. Since the cardinality of this
> column is relatively high, I want to exclude it from the dictionary for
> faster loading. Is there any way to do this in Carbondata DDL?
> 
> When I use TBLPROPERTIES ('DICTIONARY_INCLUDE'='Account'), Account will be
> defined as a dimension, but it will also be included in the dictionary.
> 
> 
> Thanks,
> Sanoj





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dimension-column-of-integer-type-to-exclude-from-dictionary-tp9961p10008.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[jira] [Created] (CARBONDATA-850) Fix the comment definition issues of CarbonData thrift files

2017-04-04 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-850:
-

 Summary: Fix the comment definition issues of CarbonData thrift 
files
 Key: CARBONDATA-850
 URL: https://issues.apache.org/jira/browse/CARBONDATA-850
 Project: CarbonData
  Issue Type: Bug
  Components: file-format
Reporter: Liang Chen
Assignee: Liang Chen
Priority: Minor
 Fix For: 1.1.0-incubating


Fix the comment definition issues of CarbonData thrift files, for help users to 
easier understand CarbonData file format



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-849) if alter table ddl is executed on non existing table, then error message is wrong.

2017-04-04 Thread ravikiran (JIRA)
ravikiran created CARBONDATA-849:


 Summary: if alter table ddl is executed on non existing table, 
then error message is wrong.
 Key: CARBONDATA-849
 URL: https://issues.apache.org/jira/browse/CARBONDATA-849
 Project: CarbonData
  Issue Type: Bug
  Components: sql
Reporter: ravikiran
Assignee: ravikiran
Priority: Minor


The error message getting while running alter on the non existing table is : 

Exception in thread "main" 
org.apache.carbondata.spark.exception.MalformedCarbonCommandException: 
Unsupported alter operation on hive table

but this is not correct. 

The hive table has blocked the alter DDL on its tables. So Carbon should be 
consistent with HIVE.

Correct msg : 
Operation not allowed: alter table name compact 'minor'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-848) Select count(*) from table gives an exception in Presto

2017-04-04 Thread Bhavya Aggarwal (JIRA)
Bhavya Aggarwal created CARBONDATA-848:
--

 Summary: Select count(*) from table gives an exception in Presto
 Key: CARBONDATA-848
 URL: https://issues.apache.org/jira/browse/CARBONDATA-848
 Project: CarbonData
  Issue Type: Bug
  Components: presto-integration
Reporter: Bhavya Aggarwal
Assignee: Bhavya Aggarwal


The select count(*) is giving an ArrayIndexOutOfException in Presto connector.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)