[jira] [Created] (CARBONDATA-676) Code clean

2017-01-21 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-676:
--

 Summary: Code clean
 Key: CARBONDATA-676
 URL: https://issues.apache.org/jira/browse/CARBONDATA-676
 Project: CarbonData
  Issue Type: Improvement
Reporter: zhangshunyu
Assignee: zhangshunyu
Priority: Minor


To clean some code:
Correct the spelling mistake
Remove unused function
Iterate the Array instead of transform it to List.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-451) Can not run query on windows now

2016-11-26 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-451:
--

 Summary: Can not run query on windows now
 Key: CARBONDATA-451
 URL: https://issues.apache.org/jira/browse/CARBONDATA-451
 Project: CarbonData
  Issue Type: Bug
  Components: core
Reporter: zhangshunyu
Assignee: zhangshunyu
 Fix For: 0.2.0-incubating


As tablePath on windows has '/' and not replaced when substring, it would throw 
error when execute query.
I have fixed this and will raise a pr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #319: [CARBONDATA-411] Test

2016-11-15 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/319

[CARBONDATA-411] Test

test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata a

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #319


commit e57977da9fa64e87d1e54f84c35ad718a7701ec9
Author: zhaow <zhaow@zhaowdemacbook-pro.local>
Date:   2016-11-16T01:19:16Z

add sth




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-411) test

2016-11-15 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-411:
--

 Summary: test
 Key: CARBONDATA-411
 URL: https://issues.apache.org/jira/browse/CARBONDATA-411
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: zhangshunyu
Priority: Minor
 Fix For: 0.2.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #309: [CARBONDATA-402] support CreateAsSel...

2016-11-11 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/309#discussion_r87588006
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1375,6 +1376,24 @@ private[sql] case class ShowLoads(
 
 }
 
+private[sql] case class CreateCarbonTableAsSelect(
+databaseName: Option[String],
+tableName: String,
+allowExisting: Boolean,
+createSql: String) extends RunnableCommand {
+
+  override def run(sqlContext: SQLContext): Seq[Row] = {
+val dbName = getDB.getDatabaseName(databaseName, sqlContext)
+val subQueryIndex = createSql.toUpperCase.indexOf("SELECT")
--- End diff --

when the table name likes “selectabcd”,the index would be the right 
one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #259: Fix constants and method names

2016-10-26 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/259

Fix constants and method names

## Why raise this pr?
To rename some constants and method names, for example:
It is hard to get clear about what the parameter is used for 
'carbon.number.of.cores', cores for what?
It is hard to get clear about what the method is used for 
'getNumberOfCores', query or load cores?
etc
## How to test?
Pass all the test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata constants

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/259.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #259


commit 8a3c1b4758a93d7e5b7c1d983f9a9309995f4c79
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-10-26T13:53:21Z

Fix constans




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #222: [CARBONDATA-221] Fix the bug of inve...

2016-10-23 Thread Zhangshunyu
GitHub user Zhangshunyu reopened a pull request:

https://github.com/apache/incubator-carbondata/pull/222

[CARBONDATA-221] Fix the bug of inverted index that store inverted index in 
metadata by using Encoding.INVERTED_INDEX.

## Why raise this pr?
1. Problem: In current code, inverted index in ddl info is not stored into 
store, and when we restart the cluster, query might mismatch.
2. To fix problem 1, current code set always true to use inverted index, 
and we can not configure inverted index now. We should fix this problem from 
its root cause.

## How to solve?
Using the Encoding as the indentifier to check whether using inverted 
index, this Encoding is in thrift format now, so we no need to modify the 
thrift format.

Here it is the same to the query logic in  
CompressedDimensionChunkFileBasedReader:
```
if 
(CarbonUtil.hasEncoding(dimensionColumnChunk.get(blockIndex).getEncodingList(),
Encoding.INVERTED_INDEX)) {
  invertedIndexes = CarbonUtil
  
.getUnCompressColumnIndex(dimensionColumnChunk.get(blockIndex).getRowIdPageLength(),
  fileReader.readByteArray(filePath,
  dimensionColumnChunk.get(blockIndex).getRowIdPageOffset(),
  
dimensionColumnChunk.get(blockIndex).getRowIdPageLength()), numberComressor);
  // get the reverse index
  invertedIndexesReverse = getInvertedReverseIndex(invertedIndexes);
}
```
it also use  Encoding.INVERTED_INDEX to check whether one column is use 
inverted index.

## How to test?
Pass all the test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata fix_index

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #222


commit c27a8a9e33529e53020c477c70d0c079724070d2
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T07:48:03Z

Save useInvertedIndex info into thrift store

commit 3c8da81869e1a8eca8bdde3d82bc0a9d185bdc3d
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T07:48:15Z

Save useInvertedIndex info into thrift store

commit b834e4889f5c5eadcee1c232c1a6070df0c1bf60
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T09:46:12Z

Fix the judge of no_dic_col

commit e8b338c2a7a9e3e28a591bdfe57a5f704f1496d6
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T10:04:20Z

add commont




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #222: [CARBONDATA-221] Fix the bug of inve...

2016-10-23 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/222


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-12 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83139950
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long 
blockSize, long fileSize) {
 if (remainder > 0) {
   maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder;
 }
+LOGGER.info("The configured block size is " + blockSize + " byte, " +
--- End diff --

@Jay357089 I think this is a good idea to extract ConvertByteToReadable as 
a method, since it can be used in many logs, especially for analyzing 
performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #231: [CARBONDATA-311]Log the data size of...

2016-10-12 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/231

[CARBONDATA-311]Log the data size of blocklet during data load.

## Why raise this pr?
The blocklet size is an important parameter for analyzing data load and 
query, this info should be logged.
## How to test?
Pass all the test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata logblocklet

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #231


commit a110504f58e688e42223e896f7a1cf729463cf9d
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-10-13T03:17:21Z

Log the data size of each blocklet




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-311) Log the data size of blocklet during data load.

2016-10-12 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-311:
--

 Summary: Log the data size of blocklet during data load.
 Key: CARBONDATA-311
 URL: https://issues.apache.org/jira/browse/CARBONDATA-311
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 0.1.1-incubating
Reporter: zhangshunyu
Assignee: zhangshunyu
Priority: Minor
 Fix For: 0.2.0-incubating


Log the data size of blocklet during data load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-12 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83027603
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 ---
@@ -252,6 +252,9 @@ private static long getMaxOfBlockAndFileSize(long 
blockSize, long fileSize) {
 if (remainder > 0) {
   maxSize = maxSize + HDFS_CHECKSUM_LENGTH - remainder;
 }
+LOGGER.info("The configured block size is " + blockSize + " byte, " +
--- End diff --

@jackylk set in mb,but here already converted to byte.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #224: [CARBONDATA-239]Add scan_blocklet_nu...

2016-10-12 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/224#discussion_r82958230
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/processor/AbstractDataBlockIterator.java
 ---
@@ -127,11 +133,15 @@ protected boolean updateScanner() {
 }
   }
 
-  private AbstractScannedResult getNextScannedResult() throws 
QueryExecutionException {
+  private AbstractScannedResult 
getNextScannedResult(QueryStatisticsRecorder recorder,
--- End diff --

@sujith71955 OK, i will use a statistics model, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #225: [CARBONDATA-295]Abstract Compressor ...

2016-10-10 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/225


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-295) Abstract Snappy interface and seperate it from Compressor interface

2016-10-10 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-295:
--

 Summary: Abstract Snappy interface and seperate it from Compressor 
interface
 Key: CARBONDATA-295
 URL: https://issues.apache.org/jira/browse/CARBONDATA-295
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Affects Versions: 0.1.1-incubating
Reporter: zhangshunyu
Assignee: zhangshunyu
Priority: Minor
 Fix For: 0.2.0-incubating


Currently, we only have snappy compressor who extends form Compressor 
interface, for future expansion, we need to abstract Snappy interface and 
seperate it from Compressor interface, it means Compressor interface is the 
parent of all compressors, and SnappyCompressor interface and the other 
compressor's interface(or abstract class) should extends Compressor interface, 
as to different data type for different compressor, it would extend its own 
interface/abstract class.
for example: Compressor -> SnappyCompressor -> SnappyDoubleCompression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-293) Add scan_blocklet_num for query statistics

2016-10-09 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-293:
--

 Summary: Add scan_blocklet_num for query statistics
 Key: CARBONDATA-293
 URL: https://issues.apache.org/jira/browse/CARBONDATA-293
 Project: CarbonData
  Issue Type: Improvement
  Components: data-query
Affects Versions: 0.1.1-incubating
Reporter: zhangshunyu
Assignee: zhangshunyu
 Fix For: 0.2.0-incubating


Add scan_blocklet_num for query statistics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...

2016-10-09 Thread Zhangshunyu
GitHub user Zhangshunyu reopened a pull request:

https://github.com/apache/incubator-carbondata/pull/204

[CARBONDATA-280]Fix the bug that when table properties is repeated it only 
set the last one

## Why raise this pr?
When table properties is repeated it only set the last one, for example,
```
CREATE TABLE IF NOT EXISTS carbontable
(ID Int, date Timestamp, country String,
name String, phonetype String, serialname String, salary Int)
STORED BY 'carbondata'
TBLPROPERTIES('DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID',
'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary')
```
As we use map to store the properties, only salary is set to 
DICTIONARY_INCLUDE and only phonetype is set to DICTIONARY_EXCLUDE.

## How to solve?
**We should do restrict syntax check that 
'DICTIONARY_EXCLUDE'='country,phonetype' , 'DICTIONARY_INCLUDE'='ID,salary**' 
and if table properties is repeated, throw an  MalformedCarbonCommandException 
to tell the user that Table properties is repeated, so that the user would not 
perform error operation.

## How to test?
Pass the exist test cases and the new test case for this bug.
## Test Result
CI has passed:
http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/354/testReport/

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata tbprop

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #204


commit 3e0030e04bff9d11f87471684b4b7b7a8d8b6209
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-28T04:50:01Z

Fix the bug that when table properties is repeated it only set the last one

commit 1828b2b78b3de9f7fa127cfcc17bf24d6c138640
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-28T05:11:12Z

Fix the test case

commit 876828400f02bb68190222e38898ccec29bb2f04
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-28T08:34:36Z

Simply

commit a7a03508b494701ec641b66449a2f0df81e2fde0
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-28T08:38:57Z

Simply




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #222: [CARBONDATA-221] Fix the bug of inve...

2016-10-09 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/222

[CARBONDATA-221] Fix the bug of inverted index that store inverted index in 
metadata.

## Why raise this pr?
1. Problem: In current code, inverted index in ddl info is not stored into 
store, and when we restart the cluster, query might mismatch.
2. To fix problem 1, current code set always true to use inverted index, 
and we can not configure inverted index now, this is not reasonable. We should 
fix this problem from its root cause.

## How to solve?
Using the Encoding as the indentifier to check whether using inverted 
index, this Encoding is in thrift format now, so we no need to modify the 
thrift format.

## How to test?
Pass all the test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata fix_index

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #222


commit c27a8a9e33529e53020c477c70d0c079724070d2
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T07:48:03Z

Save useInvertedIndex info into thrift store

commit 3c8da81869e1a8eca8bdde3d82bc0a9d185bdc3d
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T07:48:15Z

Save useInvertedIndex info into thrift store

commit b834e4889f5c5eadcee1c232c1a6070df0c1bf60
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T09:46:12Z

Fix the judge of no_dic_col

commit e8b338c2a7a9e3e28a591bdfe57a5f704f1496d6
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T10:04:20Z

add commont




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-289) Support MB/M for table block size and update the doc about this new feature.

2016-10-08 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-289:
--

 Summary: Support MB/M for table block size and update the doc 
about this new feature. 
 Key: CARBONDATA-289
 URL: https://issues.apache.org/jira/browse/CARBONDATA-289
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 0.1.0-incubating
Reporter: zhangshunyu
Assignee: zhangshunyu
Priority: Minor
 Fix For: 0.2.0-incubating


Support MB/M for table block size and update the doc about this new feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...

2016-10-07 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/204


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #52: [WIP] Support varchar datatype as SPA...

2016-09-28 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/52


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [Discuss]Set block_size for table on table level

2016-09-28 Thread Zhangshunyu
For each table, we can set block size consider the data.size, this is because
that when execute query, each task will get one block to process one time,
when the blocks num <  parallelism, set a reasonable block size would get
most suitable block num, to make the best of parallelism.




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-block-size-for-table-on-table-level-tp1472p1538.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...

2016-09-27 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/204

[CARBONDATA-280]Fix the bug that when table properties is repeated it only 
set the last one

## Why raise this pr?
When table properties is repeated it only set the last one, for example,
```
CREATE TABLE IF NOT EXISTS carbontable
(ID Int, date Timestamp, country String,
name String, phonetype String, serialname String, salary Int)
STORED BY 'carbondata'
TBLPROPERTIES(**'DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID',
'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary'**)
```
only salary is set to DICTIONARY_INCLUDE and only phonetype is set to 
DICTIONARY_EXCLUDE.

## How to solve?
We should do restrict syntax check and if table properties is repeated, 
throw an  MalformedCarbonCommandException to tell the user that Table 
properties is repeated, so that the user would not perform error operation.

## How to test?
Pass the exist test cases and the new test case for this bug.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata tbprop

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #204


commit 3e0030e04bff9d11f87471684b4b7b7a8d8b6209
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-28T04:50:01Z

Fix the bug that when table properties is repeated it only set the last one




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-280) when table properties is repeated it only set the last one

2016-09-27 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-280:
--

 Summary:  when table properties is repeated it only set the last 
one
 Key: CARBONDATA-280
 URL: https://issues.apache.org/jira/browse/CARBONDATA-280
 Project: CarbonData
  Issue Type: Bug
  Components: sql
Affects Versions: 0.1.1-incubating
Reporter: zhangshunyu
Assignee: zhangshunyu
Priority: Minor
 Fix For: 0.2.0-incubating


when table properties is repeated it only set the last one:
For example,
CREATE TABLE IF NOT EXISTS carbontable
(ID Int, date Timestamp, country String,
name String, phonetype String, serialname String, salary Int)
STORED BY 'carbondata'
 TBLPROPERTIES('DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID',
 'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary')

only salary is set to DICTIONARY_INCLUDE and only phonetype is set to 
DICTIONARY_EXCLUDE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 回复:[Discuss]Set block_size for table on table level

2016-09-27 Thread Zhangshunyu
I have verified that it would not affect the older tables.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discuss-Set-block-size-for-table-on-table-level-tp1472p1531.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: [jira] [Created] (CARBONDATA-275) org.apache.thrift.TBaseHelper.hashCode(int) can't find this function

2016-09-27 Thread Zhangshunyu
Use thrift 0.93 can solve this problem.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/jira-Created-CARBONDATA-275-org-apache-thrift-TBaseHelper-hashCode-int-can-t-find-this-function-tp1488p1530.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[GitHub] incubator-carbondata pull request #195: FIX CI

2016-09-23 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/195

FIX CI



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata FIXCI

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 22b1f1491d5e5306db012a7541aa30790d11cdae
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-23T08:08:16Z

FIX CI




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #191: [WIP] Change delete segments parser

2016-09-22 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/191

[WIP] Change delete segments parser




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata parser925

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #191


commit abd270f2c6114e35e0aa1da71c9b2498187357b8
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-22T12:53:52Z

New parser gram

commit a8d18e07f68cb469e235fd7eebca9df3630163e7
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-22T12:54:09Z

New parser gram




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...

2016-09-22 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/189

[CARBONDATA-267] Set block_size for table on table level

## Why raised this pr?
To configure block file size for each table on column level.
## How to solve?
Add a new parameter in TableSchema, when create table, setting it in table 
properties and write this info into thrift file.
## How to test?
Pass all the test cases and the new test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata blocksize922

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #189


commit ed40d0f8012297cc9e9cffb3812ef5d141e03879
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-22T08:19:29Z

Set block_size for table on table level




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #188: [WIP] Add table_block_size on table ...

2016-09-22 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/188


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-267) Set block_size for table on table level

2016-09-22 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-267:
--

 Summary: Set block_size for table on table level
 Key: CARBONDATA-267
 URL: https://issues.apache.org/jira/browse/CARBONDATA-267
 Project: CarbonData
  Issue Type: New Feature
Affects Versions: 0.1.0-incubating
Reporter: zhangshunyu
Assignee: zhangshunyu
 Fix For: 0.2.0-incubating


Set block_size for table on table level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #188: [WIP] Add table_block_size on table ...

2016-09-22 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/188

[WIP] Add table_block_size on table level.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata block_size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #188


commit a1243394de4daa6ea8fdc1024266dec2e40b45ea
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-21T08:43:47Z

Add delete all carbon tables

commit 5783520db18e2fdfac3ab98c9436a5ff06228988
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-21T09:01:46Z

Add test case

commit 602cc09b2fb6f5169267264ef9c1190717e5fae9
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-21T09:20:30Z

Fix test case

commit eda8f100f4fd4c42f46061cb72681df16d6cfb84
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-21T09:47:49Z

Fix test case

commit 847d21e3310e9c9e7424e7324064358a2b2bce5f
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-21T09:49:36Z

Fix test case

commit 63eb0dd21d6ca64a743ed2e3311d42a676ae48b3
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-22T03:03:58Z

Add a new blocksize format param

commit 78ee8a46949417bfa06b274c9f2750299aba997b
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-22T03:22:31Z

Add a new blocksize format param

commit eed034458f438eb1154889e53bfe85775a476abe
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-22T03:32:51Z

Add a new blocksize format param

commit 01565eafea966317bdfa46d52c274bd1bdc39671
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-22T07:14:04Z

Add parser level




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #185: Add a new feature that support delet...

2016-09-21 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/185

Add a new feature that support delete all carbon tables under one database.

Why rasie this pr?
Add a new feature that support delete all carbon tables under one database.
**Only delete all carbon tables, do not has effect on other tables.**

How to test?
Pass all the testcases including the new test case.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata dropall

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #185


commit bbac6bc2db5ecc88ac8dd886108054ab22c726e4
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-21T08:43:47Z

Add delete all carbon tables

commit f0e922673bb3801b50fe19d9ef279ece028ebfac
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-21T09:01:46Z

Add test case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #178: [WIP] Fix NULL values issue

2016-09-20 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/178


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #178: [WIP] Fix NULL values issue

2016-09-20 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/178

[WIP] Fix NULL values issue



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata null

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #178


commit a2b46eaf3add1fd3923e7dc1010ad5ea72d6341f
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-20T12:29:11Z

Fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-254) Code Inspection Optiminization

2016-09-17 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-254:
--

 Summary: Code Inspection Optiminization
 Key: CARBONDATA-254
 URL: https://issues.apache.org/jira/browse/CARBONDATA-254
 Project: CarbonData
  Issue Type: Improvement
Reporter: zhangshunyu


Code Inspection Optiminization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #171: [WIP] Code Inspection Optiminization

2016-09-17 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/171

[WIP] Code Inspection Optiminization



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata 
codeinspection

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #171


commit 31aca94d187e436c30dde56e2fa438f2bc250f5d
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-18T03:00:35Z

inspection

commit 7a200303705ee87fb1543f1b9867a5251a746d09
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-18T03:17:19Z

code inspection optiminization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #52: [CARBONDATA-104] Support varchar data...

2016-09-16 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/52


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-231) Rename repeared table names in same test file and add drop tables.

2016-09-08 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-231:
--

 Summary: Rename repeared table names in same test file and add 
drop tables.
 Key: CARBONDATA-231
 URL: https://issues.apache.org/jira/browse/CARBONDATA-231
 Project: CarbonData
  Issue Type: Improvement
Reporter: zhangshunyu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #142: [WIP][CARBONDATA-221] Fix the bug of...

2016-09-08 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/142

[WIP][CARBONDATA-221] Fix the bug of inverted index that store inverted 
index in metadata.

## Why raise this pr?
Inverted index in ddl info was not stored into store, and when we restart 
the culster, query might mismatch.
## How to solve?
Using the Encoding as the indentifier to check whether using inverted index.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata index

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #142


commit f76908319c8eae0b65ea2cd9d0f5899225c95667
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T07:48:03Z

Save useInvertedIndex info into thrift store

commit ecd5403105fd37da239db66bd313cf548a532eef
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-08T07:48:15Z

Save useInvertedIndex info into thrift store




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #129: Remove not needed parameters

2016-09-07 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/129


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #129: Remove not needed parameters

2016-09-06 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/129

Remove not needed parameters

There are many parameters in CarbonCommonConstants we not use now, should 
remove them.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata mater

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #129


commit 296a7350a667a13a46c74b2b38d36ee8dc13f53f
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-05T06:33:36Z

remove not needed parameters




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #117: [WIP]Fix the bug that when subquery ...

2016-09-01 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/117

[WIP]Fix the bug that when subquery with sort and filter the result is 
empty.

## Why raise this pr?
Fix this bug: When the query has subquery with sort and filter, it can not 
return resullt.
## How to solve?
When the query likes this, the optimized plan by spark never push down the 
filter, and as aresult the sort is not decoded by carbon, when use filter, the 
int values can not resolved as string values by spark.
So we shoud decode them earlier when the child of filter is sort.
How to test?
Added new testcases and should pass them and pass CI.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata query91

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #117


commit 31eb578927044a2d6ea8d80c487b5b5c4f73004a
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-01T07:33:01Z

Fix the bug that subquery with sort and filter the result is empty

commit 6ecc2e69172c4103fca2ff6d7173a2236be4fc03
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-09-01T07:45:54Z

Fix the bug that subquery with sort and filter the result is empty




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #110: [CARBONDATA-193]Fix the bug that neg...

2016-08-31 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/110#discussion_r76929947
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/ValueCompressionUtil.java ---
@@ -78,26 +78,26 @@ private ValueCompressionUtil() {
   private static DataType getDataType(double value, int decimal, byte 
dataTypeSelected) {
 DataType dataType = DataType.DATA_DOUBLE;
 if (decimal == 0) {
-  if (value < Byte.MAX_VALUE) {
+  if (value < Byte.MAX_VALUE && value > Byte.MIN_VALUE) {
--- End diff --

Before i modified to use new "absMaxValue", it is needed, but now i can 
detele this code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #110: [wip][CARBONDATA-193]Fix the bug tha...

2016-08-30 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/110

[wip][CARBONDATA-193]Fix the bug that negative data compress is not 
properly when datatype is Double

## Why raise this pr?
**Fix bug: negative data compress is not properly when datatype is Double.**
For example, If the column datatype is double and it data is like this:
-7489.7976
-11234567490
-11234567490
-1.2
-2
-11234567490
-11234567490
-11234567490
-11234567490
**the query result would be all 0, this is a bug.**
## How to solve?
This bug is becasue we only consider the MAX value of this column is 
+values, and conpare it wll Byte.MAXCVALUE, here is 127. But when the values is 
-12343554634645, it also < Byte.MAXVALUE, but we can not use byte, becasue it < 
-127, so we should consider both Byte.MAXVALUE and Byte.MINVALUE, the same to 
other datatype.
How to test?
Added test case: test("When the values of Double datatype are negative 
values"), should pass all the exist cases and this new testcase.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata double92

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/110.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #110


commit 5773928ff7c7867a1925ce258fb2b57bc49513b6
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-08-31T01:34:12Z

Fix the bug that negtive data compress is not properly when datatype is 
Double




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-193) Data is not loading properly when double data type is having negative values

2016-08-30 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-193:
--

 Summary: Data is not loading properly when double data type is 
having negative values
 Key: CARBONDATA-193
 URL: https://issues.apache.org/jira/browse/CARBONDATA-193
 Project: CarbonData
  Issue Type: Bug
Reporter: zhangshunyu


For example:
-7489.797600
-11234567489.797
-11234567489.7
-1.2
-2
-11234567489.797600
-11234567489.797600
-11234567489.797600
-11234567489.797600
would be all 0 after query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #104: [CARBONDATA-188] Compress CSV file b...

2016-08-29 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/104#discussion_r76580960
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -112,25 +116,29 @@ private void initializeReader() throws IOException {
 // if already one input stream is open first we need to close and then
 // open new stream
 close();
-// get the block offset
-long startOffset = 
this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockOffset();
-FileType fileType = FileFactory
-
.getFileType(this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath());
-// calculate the end offset the block
-long endOffset =
-
this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockLength() + 
startOffset;
-
-// create a input stream for the block
-DataInputStream dataInputStream = FileFactory
-
.getDataInputStream(this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath(),
-fileType, bufferSize, startOffset);
-// if start offset is not 0 then reading then reading and ignoring the 
extra line
-if (startOffset != 0) {
-  LineReader lineReader = new LineReader(dataInputStream, 1);
-  startOffset += lineReader.readLine(new Text(), 0);
+
+String path = 
this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath();
+FileType fileType = FileFactory.getFileType(path);
+
+if (path.endsWith(".gz")) {
+  DataInputStream dataInputStream =
+  FileFactory.getCompressedDataInputStream(path, fileType, 
bufferSize);
+  inputStreamReader = new BufferedReader(new 
InputStreamReader(dataInputStream));
+} else {
+  long startOffset = 
this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockOffset();
+  long blockLength = 
this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockLength();
+  long endOffset = blockLength + startOffset;
+
+  DataInputStream dataInputStream = 
FileFactory.getDataInputStream(path, fileType, bufferSize);
+
+  // if start offset is not 0 then reading then reading and ignoring 
the extra line
+  if (startOffset != 0) {
+LineReader lineReader = new LineReader(dataInputStream, 1);
+startOffset += lineReader.readLine(new Text(), 0);
+  }
+  inputStreamReader = new BufferedReader(new InputStreamReader(
+  new BoundedDataStream(dataInputStream, endOffset - 
startOffset)));
--- End diff --

Can not find class BoundedDataStream


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #103: Fix the bug that when using Decimal ...

2016-08-29 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/103

Fix the bug that when using Decimal type as dictionary gen surrogate key 
will mismatch for the same values during increment load.

## Why raise this pr?
**Fix bug: when using Decimal type as dictionary gen surrogate key will 
mismatch for the same values during increment load.**
For example, when we specify Decimal type column using dictionary, as the 
using of `DataTypeUtil.normalizeColumnValueForItsDataType`, deciaml data for 
example 45, if we specify the precision of this column as 3, parsedValue would 
be 45.000, and this  45.000 would be written into dic file by 
writer.write(parsedValue). As a result, the second time we load the same data 
45, dictionary.getSurrogateKey(value) would compare the value with dic value, 
but here the value is 45, our dic value is 45.000 stored as string, so dic 
would think that i don not have 45, this would lead to repeated values in dic,  
this is a mistake.
How to solve this?
Before check the surrogate key, if the datatype is decimal, we first using 
his parsedValue as value to check, this would not take 45 itself as different 
value.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata decimalDic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #103


commit 0403b9fe4ed32b9cbc4727b5a541cfccb089422e
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-08-29T08:29:54Z

Fix the bug that when Decimal type as dictionary gen surrogate key will 
mismatch for the same values




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...

2016-08-27 Thread Zhangshunyu
GitHub user Zhangshunyu reopened a pull request:

https://github.com/apache/incubator-carbondata/pull/81

[CARBONDATA-132] Fix the bug that the CSV file header exception can not be 
shown to user using beeline. 

## Why raise this pr:
**For bug fix: The exception that 'CSV File provided is not proper. Column 
names in schema and csv header are not same' can not be shown to beeline.**

For example, when data load is failed because of wrong csv file header in 
load DDL, the exception message only shows in executor side like "CSV header 
provided in DDL is not proper. Column names in schema and CSV header are not 
the same" but the **user using beeline can not get it from driver side because 
dirver only shows "Dataload Failure"** , it is very inconvenient for user to 
get the reason unless he check the executor log info.

## How to solve:
Get the Exception on driver side and parse the cause, get the casue message 
to driver. Show DataLoadingException is because that it is mainly about CSV 
file and wrapped in understandable message which can be shown to the user.

## How to test
Add new test cases:
1. If both ddl and file not have fileheader:
Beeline will show like : "DataLoad failure: CSV File provided is not 
proper. Column names in schema and csv header are not same. CSVFile Name : 
windows.csv"
2. If ddl did not provide the proper file header:
Beeline will show like :"DataLoad failure: CSV header provided in DDL is 
not proper. Column names in schema and CSV header are not the same."

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata exc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #81


commit d6c32cb6ea80ccfe9f7aee1e14236d90933fce1a
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-08-22T02:08:26Z

Parse some Spark exception from executor side and show them directly on 
driver

commit 5e1235ae317ea1915f3cba24f57fad6634754af3
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2016-08-09T05:17:02Z

CARBONDATA-153 Record count is not matching while loading the data when one 
data node went down in HA setup

commit 62c0b05e62e3c2cadc03e6355bd587d14eab355c
Author: Venkata Ramana G <ramana.gollam...@huawei.com>
Date:   2016-08-22T13:29:06Z

[CARBONDATA-153] This closes #77

commit 7e0584e7a1d90724e88fffd6fcea15e5ba640da8
Author: manishgupt88 <tomanishgupt...@gmail.com>
Date:   2016-07-19T09:25:52Z

Perform equal distribution of dictionary values among the sublists of a 
list whenever a dictionary file is loaded into memory

commit 2d4609cdface93ea3f3a7a92e088e5b98f24f7e2
Author: Venkata Ramana G <ramana.gollam...@huawei.com>
Date:   2016-08-23T14:02:03Z

[CARBONDATA-80] This closes #44

commit fe1b0f07deda03fe21b98191be7750bf61d8520c
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2016-07-20T10:32:18Z

CARBONDATA-117 BlockLet distribution for optimum resource usage

commit 5ebf90a87999b9dd5ec484e54aceb7487ca3096f
Author: Venkata Ramana G <ramana.gollam...@huawei.com>
Date:   2016-08-23T15:00:07Z

[CARBONDATA-117] This closes #56

commit 61e40eb0033fca3ffc8d09d392b6090cde284652
Author: ravikiran <ravikiran.sn...@gmail.com>
Date:   2016-08-23T13:58:51Z

Delete the lock file once the unlocking is done.

commit 64586059241589ecae6e8846ff4643ab03647041
Author: Venkata Ramana G <ramana.gollam...@huawei.com>
Date:   2016-08-23T15:30:04Z

[CARBONDATA-170] This closes #86

commit 897c12a031791f60a80f859093837cbd6989e84c
Author: Jay357089 <liujunj...@huawei.com>
Date:   2016-08-22T12:19:06Z

colDict_Alldict

commit c11058d7435f4176b1fee1d9fe637eb233936a6a
Author: Venkata Ramana G <ramana.gollam...@huawei.com>
Date:   2016-08-23T18:59:57Z

[CARBONDATA-169] This closes #83

commit eac5573a644118c4942715f15e629ffa9ca1141b
Author: mohammadshahidkhan <mohdshahidkhan1...@gmail.com>
Date:   2016-08-23T15:17:28Z

[CARBONDATA-171] Block distribution not proper when the number of active 
executors more than the node size

commit 1a28ada21af0f0ff975c93252fdbec959974e542
Author: Venkata Ramana G <ramana.gollam...@huawei.com>
Date:   2016-08-23T19:28:45Z

[CARBONDATA-171] This closes #87

commit d981c0d06e0a9f0881533f87c405b4464f71019c
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-08-24T02:06:02Z

fix review comments

commit 6e4b21e5372c7b0d4c47dcd0d3366148d717526c
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-08-24T03:11:22Z

add test case

commit b59d5c77d80

[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...

2016-08-27 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-178) table not exist when execute show segments using spark-sql and beeline the same time

2016-08-24 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-178:
--

 Summary: table not exist when execute show segments using 
spark-sql and beeline the same time
 Key: CARBONDATA-178
 URL: https://issues.apache.org/jira/browse/CARBONDATA-178
 Project: CarbonData
  Issue Type: Bug
Reporter: zhangshunyu


1 When using beeline and sparksql the same time, if create a table and load 
data into it by sparksql, beeline and sparksql would see the same table by show 
tables, but if execute show segments by beeline, it would throws exception that 
this table is not exists.
2. But if restart beeline or select the table before show segements, it is OK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #94: Fix bug that table not exist when exe...

2016-08-24 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/94

Fix bug that table not exist when execute show segments using spark-sql and 
beeline the same time.

## Why raise this pr:
1 When using beeline and sparksql the same time, if create a table and load 
data into it by sparksql, beeline and sparksql would see the same table by show 
tables, but if execute show segments by beeline, it would throws exception that 
this table is not exists.
2. But if restart beeline or select the table before show segements, it is 
OK.

## How to solve this:
The problem is that beeline and sparksql using different process, they have 
different tableInfoMap and beeline will not get the table from his own map, 
althouh sparksql put his table into the tableInfoMap.
So, we can use tableExists to check, 
checkSchemasModifiedTimeAndReloadTables in tableExists would check the 
"modifiedTime.mdt", if it is change by different process(here is spark sql), 
the other process(here is beeline) should reload the metadata firstly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata showloadbug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/94.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #94


commit 9e1a4d918e355aed17643b9d4e97bee81147b774
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-08-25T03:40:35Z

Fix the bug that table not exist exception occured when using sparksql and 
beeline the same time




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #89: Fix the problem of hdfs lock and move...

2016-08-24 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/89

Fix the problem of hdfs lock and move the lock file inside the table folder.

## Why raise this pr
1. The hdfs lock file for one table should be put inside this table's store 
path, this is more reasonable, and if the store path is not set, then we put it 
into hadoop.tmp.dir.
For example: if the store path of carbon on hdfs is 
/user/hive/warehouse/carbon.store, then the lock file for this table woud be: 
/user/hive/warehouse/carbon.store/default/table_name/meta.lock
2. This bug is found by : Some times, hadoop configured wrong 
hadoop.tmp.dir, hadoop can still work normally, but carbon's hdfs lock can not 
work normally, it will throws exception: "Table is locked for updation. Please 
try after some time".


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata hdfslock

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/89.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #89


commit 2ea2bfbef39622d7371c3afcdd2bbe5ce278bb22
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-08-24T06:30:50Z

fix the problem of hdfs lock and move the lock file inside the table folder




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

2016-08-23 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/81#discussion_r75984291
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging {
   loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
   logInfo("DataLoad failure")
   logger.error(ex)
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Parse some Spark exc...

2016-08-23 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/81#discussion_r75984276
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -775,6 +777,13 @@ object CarbonDataRDDFactory extends Logging {
   loadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE
   logInfo("DataLoad failure")
   logger.error(ex)
+  ex match {
+case sparkException: SparkException =>
+  if 
(sparkException.getCause.isInstanceOf[DataLoadingException]) {
+executorMessage = sparkException.getCause.getMessage
+  }
+case _ =>
--- End diff --

Here we only get DataLoadingException from executor and show it directly to 
user so that he can know his incorrect operation, but the other exception we 
still use "DataLoad Failure", because we do not show internal error to user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #84: [CARBONDATA-167]Fix that 'UndeclaredT...

2016-08-23 Thread Zhangshunyu
Github user Zhangshunyu closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/84


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-166) create table contains shared dictionary,and the shared dictionary keywords is not complete,create table can success and load failed

2016-08-22 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-166:
--

 Summary: create table contains shared dictionary,and the shared 
dictionary keywords is not complete,create table can success and load failed
 Key: CARBONDATA-166
 URL: https://issues.apache.org/jira/browse/CARBONDATA-166
 Project: CarbonData
  Issue Type: Bug
Reporter: zhangshunyu
Assignee: zhangshunyu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #70: [CARBONDATA-154] Fix the bug of block...

2016-08-09 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/70

[CARBONDATA-154] Fix the bug of block prune that query result is wrong.

## Why raise this pr:

During block prune, endkey is always only decided by the last filter 
expression, this is a bug
and can lead wrong result,
For example, when load data whose dimension column is 12 lines of 'a', 
12 lines of 'b', 12 lines of 'c', if query like "select * from 
tablename where colname='c' or colname='b' or colname='a'" only 12lines 'a' 
will be selected because of wrong endkey.

## How to solve this:

Fix the end key consider all the filter expression end key get max and 
start key get (min - 1) for each column level, using this to produce a new 
start key an a new endkey.

For more details please look at the test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata blockprune

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/70.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #70


commit d2bdb538f8b30df39ae3730aa0498a44ed934f03
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-08-10T03:09:20Z

fix the bug og block prune




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-154) Block prune can not get the right blocks and query result is wrong

2016-08-09 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-154:
--

 Summary: Block prune can not get the right blocks and query result 
is wrong
 Key: CARBONDATA-154
 URL: https://issues.apache.org/jira/browse/CARBONDATA-154
 Project: CarbonData
  Issue Type: Bug
Reporter: zhangshunyu
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-123) Stored by 'carbondata' or 'org.apache.carbondata.format' shoulb be not case senstive

2016-07-28 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-123:
--

 Summary: Stored by 'carbondata' or 'org.apache.carbondata.format' 
shoulb be not case senstive
 Key: CARBONDATA-123
 URL: https://issues.apache.org/jira/browse/CARBONDATA-123
 Project: CarbonData
  Issue Type: Bug
Reporter: zhangshunyu
Assignee: zhangshunyu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-104) To support varchar datatype

2016-07-25 Thread zhangshunyu (JIRA)
zhangshunyu created CARBONDATA-104:
--

 Summary: To support varchar datatype
 Key: CARBONDATA-104
 URL: https://issues.apache.org/jira/browse/CARBONDATA-104
 Project: CarbonData
  Issue Type: New Feature
Reporter: zhangshunyu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #24: Correct the log info

2016-07-05 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/24

Correct the log info



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata info

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #24


commit e03c277d1e2327a4972fe34ece0c58b316371925
Author: Zhangshunyu <zhangshu...@huawei.com>
Date:   2016-07-06T03:49:41Z

correct the log info




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---