Github user kumarvishal09 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/194
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user sujith71955 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/194
PR build status
http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/733
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user PallaviSingh1992 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/336#discussion_r90590085
--- Diff:
integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CarbonCompactionUtil.java
---
@@ -142,20 +142,14 @@ private static void
groupCorrespodingInfoBasedOnTask(TableBlockInfo info,
* @return
*/
public static boolean isCompactionRequiredForTable(String
metaFolderPath) {
-String minorCompactionStatusFile = metaFolderPath +
CarbonCommonConstants.FILE_SEPARATOR
-+ CarbonCommonConstants.minorCompactionRequiredFile;
-
-String majorCompactionStatusFile = metaFolderPath +
CarbonCommonConstants.FILE_SEPARATOR
-+ CarbonCommonConstants.majorCompactionRequiredFile;
+String statusFile = metaFolderPath +
CarbonCommonConstants.FILE_SEPARATOR;
try {
- if (FileFactory.isFileExist(minorCompactionStatusFile,
- FileFactory.getFileType(minorCompactionStatusFile)) ||
FileFactory
- .isFileExist(majorCompactionStatusFile,
- FileFactory.getFileType(majorCompactionStatusFile))) {
-return true;
- }
+ return (FileFactory.isFileExist(statusFile +
CarbonCommonConstants.minorCompactionRequiredFile,
--- End diff --
I will revert back to original code
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/333#discussion_r90588455
--- Diff:
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/InputProcessorStepImpl.java
---
@@ -122,24 +139,52 @@ private boolean internalHasNext() {
if (!hasNext) {
// Check next iterator is available in the list.
if (counter < inputIterators.size()) {
+ // close the old iterator
+ currentIterator.close();
// Get the next iterator from the list.
currentIterator = inputIterators.get(counter++);
+ // Initialize the new iterator
+ currentIterator.initialize();
hasNext = internalHasNext();
}
}
return hasNext;
}
-@Override
-public CarbonRowBatch next() {
- // Create batch and fill it.
- CarbonRowBatch carbonRowBatch = new CarbonRowBatch();
- int count = 0;
- while (internalHasNext() && count < batchSize) {
-carbonRowBatch.addRow(new
CarbonRow(rowParser.parseRow(currentIterator.next(;
-count++;
+@Override public CarbonRowBatch next() {
+ CarbonRowBatch result = null;
+ try {
+if (future == null) {
+ future = getCarbonRowBatch();
+}
+result = future.get();
+nextBatch = false;
+if (hasNext()) {
+ nextBatch = true;
+ future = getCarbonRowBatch();
+} else {
+ currentIterator.close();
+}
+ } catch (Exception e) {
--- End diff --
cache InterruptedException, ExecutionException only
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/333#discussion_r90588425
--- Diff:
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/InputProcessorStepImpl.java
---
@@ -122,24 +139,52 @@ private boolean internalHasNext() {
if (!hasNext) {
// Check next iterator is available in the list.
if (counter < inputIterators.size()) {
+ // close the old iterator
+ currentIterator.close();
// Get the next iterator from the list.
currentIterator = inputIterators.get(counter++);
+ // Initialize the new iterator
+ currentIterator.initialize();
hasNext = internalHasNext();
}
}
return hasNext;
}
-@Override
-public CarbonRowBatch next() {
- // Create batch and fill it.
- CarbonRowBatch carbonRowBatch = new CarbonRowBatch();
- int count = 0;
- while (internalHasNext() && count < batchSize) {
-carbonRowBatch.addRow(new
CarbonRow(rowParser.parseRow(currentIterator.next(;
-count++;
+@Override public CarbonRowBatch next() {
+ CarbonRowBatch result = null;
+ try {
--- End diff --
limit the try scope to `future.get` only
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/333#discussion_r90588408
--- Diff:
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/InputProcessorStepImpl.java
---
@@ -122,24 +139,52 @@ private boolean internalHasNext() {
if (!hasNext) {
// Check next iterator is available in the list.
if (counter < inputIterators.size()) {
+ // close the old iterator
+ currentIterator.close();
// Get the next iterator from the list.
currentIterator = inputIterators.get(counter++);
+ // Initialize the new iterator
+ currentIterator.initialize();
hasNext = internalHasNext();
}
}
return hasNext;
}
-@Override
-public CarbonRowBatch next() {
- // Create batch and fill it.
- CarbonRowBatch carbonRowBatch = new CarbonRowBatch();
- int count = 0;
- while (internalHasNext() && count < batchSize) {
-carbonRowBatch.addRow(new
CarbonRow(rowParser.parseRow(currentIterator.next(;
-count++;
+@Override public CarbonRowBatch next() {
--- End diff --
put override to previous line
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/333#discussion_r90588348
--- Diff:
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/InputProcessorStepImpl.java
---
@@ -122,24 +139,52 @@ private boolean internalHasNext() {
if (!hasNext) {
// Check next iterator is available in the list.
if (counter < inputIterators.size()) {
+ // close the old iterator
+ currentIterator.close();
// Get the next iterator from the list.
currentIterator = inputIterators.get(counter++);
+ // Initialize the new iterator
+ currentIterator.initialize();
hasNext = internalHasNext();
}
}
return hasNext;
}
-@Override
-public CarbonRowBatch next() {
- // Create batch and fill it.
- CarbonRowBatch carbonRowBatch = new CarbonRowBatch();
- int count = 0;
- while (internalHasNext() && count < batchSize) {
-carbonRowBatch.addRow(new
CarbonRow(rowParser.parseRow(currentIterator.next(;
-count++;
+@Override public CarbonRowBatch next() {
+ CarbonRowBatch result = null;
+ try {
--- End diff --
put override to previous line
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/333#discussion_r90588344
--- Diff:
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/InputProcessorStepImpl.java
---
@@ -80,40 +87,50 @@ public void initialize() throws
CarbonDataLoadingException {
return iterators;
}
- @Override
- protected CarbonRow processRow(CarbonRow row) {
+ @Override protected CarbonRow processRow(CarbonRow row) {
return null;
}
+ @Override public void close() {
+executorService.shutdown();
+ }
+
/**
* This iterator wraps the list of iterators and it starts iterating the
each
* iterator of the list one by one. It also parse the data while
iterating it.
*/
private static class InputProcessorIterator extends
CarbonIterator {
-private List> inputIterators;
+private List> inputIterators;
-private Iterator
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/333#discussion_r90588312
--- Diff:
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/InputProcessorStepImpl.java
---
@@ -80,40 +87,50 @@ public void initialize() throws
CarbonDataLoadingException {
return iterators;
}
- @Override
- protected CarbonRow processRow(CarbonRow row) {
+ @Override protected CarbonRow processRow(CarbonRow row) {
return null;
}
+ @Override public void close() {
+executorService.shutdown();
+ }
+
/**
* This iterator wraps the list of iterators and it starts iterating the
each
* iterator of the list one by one. It also parse the data while
iterating it.
*/
private static class InputProcessorIterator extends
CarbonIterator {
-private List> inputIterators;
+private List> inputIterators;
-private Iterator currentIterator;
+private InputIterator currentIterator;
private int counter;
private int batchSize;
private RowParser rowParser;
-public InputProcessorIterator(List> inputIterators,
-RowParser rowParser, int batchSize) {
+private Future future;
+
+private ExecutorService executorService;
+
+private boolean nextBatch = false;
--- End diff --
initialize in constructor, like counter
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/333#discussion_r90587816
--- Diff:
hadoop/src/main/java/org/apache/carbondata/hadoop/csv/CSVInputFormat.java ---
@@ -138,6 +140,17 @@ public static void setQuoteCharacter(String
quoteCharacter, Configuration config
}
/**
+ * Sets the read buffer size to configuration.
+ * @param bufferSize
+ * @param configuration
+ */
+ public static void setReadBufferSize(String bufferSize, Configuration
configuration) {
--- End diff --
why bufferSize is string but not int?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/379
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/379
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/379
CI
http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/732/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/380
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
GitHub user jackylk opened a pull request:
https://github.com/apache/incubator-carbondata/pull/380
Fix compatibility
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata comp
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/380.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #380
commit 7904716b9396b9ba660e4cb08ef0cba1821f3166
Author: jackylk
Date: 2016-12-02T04:10:58Z
fix compatibility
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user sujith71955 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/336
Please rebase the code
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user sujith71955 commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/194#discussion_r90505720
--- Diff:
core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ---
@@ -411,4 +411,24 @@ private static String parseStringToBigDecimal(String
value, CarbonDimension dime
}
return null;
}
+ /**
+ * This method will compare double values it will preserve
+ * the -0.0 and 0.0 equality as per == ,also preserve NaN equality check
as per
+ * java.lang.Double.equals()
+ *
+ * @param d1 double value for equality check
+ * @param d2 double value for equality check
+ * @return boolean after comparing two double values.
+ */
+ public static int compareDoubleWithNan(Double d1, Double d2) {
+if ((d1.doubleValue() == d2.doubleValue()) || (Double.isNaN(d1) &&
Double.isNaN(d2))) {
+ return 0;
+}
+else if (d1 < d2) {
+ return -1;
+}
+else {
--- End diff --
yes, i think we can remove the unnecessary else statement itself.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/378
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/CARBONDATA-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ravindra Pesala resolved CARBONDATA-480.
Resolution: Fixed
Assignee: Jacky Li
> Add file format version enum
>
>
> Key: CARBONDATA-480
> URL: https://issues.apache.org/jira/browse/CARBONDATA-480
> Project: CarbonData
> Issue Type: Improvement
>Affects Versions: 0.2.0-incubating
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 0.3.0-incubating
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Add file format version enum instead of using short value
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/378
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/378
CI passed
http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/730/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/366
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Jacky Li created CARBONDATA-480:
---
Summary: Add file format version enum
Key: CARBONDATA-480
URL: https://issues.apache.org/jira/browse/CARBONDATA-480
Project: CarbonData
Issue Type: Improvement
Affects Versions: 0.2.0-incubating
Reporter: Jacky Li
Fix For: 0.3.0-incubating
Add file format version enum instead of using short value
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
GitHub user jackylk opened a pull request:
https://github.com/apache/incubator-carbondata/pull/378
[CARBONDATA-480] Add file format version enum
Add file format version enum instead of using short value
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata fixversion
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/378.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #378
commit d9413c7e13a5e9e15c1d96b99598a96ab7da5979
Author: jackylk
Date: 2016-12-01T15:02:16Z
add file format version enum
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Lionx created CARBONDATA-479:
Summary: Guarantee consistency for keyword LOCAL and file path in
data loading command
Key: CARBONDATA-479
URL: https://issues.apache.org/jira/browse/CARBONDATA-479
Project: CarbonData
Issue Type: Bug
Reporter: Lionx
Priority: Minor
In CarbonSqlParser.scala,
protected lazy val loadDataNew: Parser[LogicalPlan] =
LOAD ~> DATA ~> opt(LOCAL) ~> INPATH ~> stringLit ~ opt(OVERWRITE) ~
(INTO ~> TABLE ~> (ident <~ ".").? ~ ident) ~
(OPTIONS ~> "(" ~> repsep(loadOptions, ",") <~ ")").? <~ opt(";") ^^ {
case filePath ~ isOverwrite ~ table ~ optionsList =>
val (databaseNameOp, tableName) = table match {
case databaseName ~ tableName => (databaseName,
tableName.toLowerCase())
}
if (optionsList.isDefined) {
validateOptions(optionsList)
}
val optionsMap = optionsList.getOrElse(List.empty[(String,
String)]).toMap
LoadTable(databaseNameOp, tableName, filePath, Seq(), optionsMap,
isOverwrite.isDefined)
}
It seems that using Keyword LOCAL impacts noting. Loading data from hdfs or
file just depends on the path.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Github user ashokblend commented on the issue:
https://github.com/apache/incubator-carbondata/pull/366
rebase done. please merge it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/377
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/377
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
GitHub user QiangCai opened a pull request:
https://github.com/apache/incubator-carbondata/pull/377
[CARBONDATA-478]Spark2 module should have different SparkRowReadSupportImpl
with spark1
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/QiangCai/incubator-carbondata master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/377.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #377
commit 3bc55a38c5d645ca1e07381910692ac0b2bb6297
Author: QiangCai
Date: 2016-12-01T11:32:04Z
fixLatedecoderIssueForSpark2
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user ashokblend commented on the issue:
https://github.com/apache/incubator-carbondata/pull/304
Closing this, as its not required here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user ashokblend closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/304
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/CARBONDATA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacky Li resolved CARBONDATA-458.
-
Resolution: Fixed
Fix Version/s: 0.3.0-incubating
> Improving carbon first time query performance
> --
>
> Key: CARBONDATA-458
> URL: https://issues.apache.org/jira/browse/CARBONDATA-458
> Project: CarbonData
> Issue Type: Improvement
> Components: core, data-load, data-query
>Reporter: kumar vishal
>Assignee: kumar vishal
> Fix For: 0.3.0-incubating
>
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> Improving carbon first time query performance
> Reason:
> 1. As file system cache is cleared file reading will make it slower to read
> and cache
> 2. In first time query carbon will have to read the footer from file data
> file to form the btree
> 3. Carbon reading more footer data than its required(data chunk)
> 4. There are lots of random seek is happening in carbon as column data(data
> page, rle, inverted index) are not stored together.
> Solution:
> 1. Improve block loading time. This can be done by removing data chunk from
> blockletInfo and storing only offset and length of data chunk
> 2. compress presence meta bitset stored for null values for measure column
> using snappy
> 3. Store the metadata and data of a column together and read together this
> reduces random seek and improve IO
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/CARBONDATA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacky Li resolved CARBONDATA-100.
-
Resolution: Fixed
Assignee: Ashok Kumar
Fix Version/s: 0.3.0-incubating
> BigInt compression
> --
>
> Key: CARBONDATA-100
> URL: https://issues.apache.org/jira/browse/CARBONDATA-100
> Project: CarbonData
> Issue Type: Bug
>Reporter: Ashok Kumar
>Assignee: Ashok Kumar
> Fix For: 0.3.0-incubating
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> In Carbon bigint is stored as long. There is no compression done on data.
> Change is required to do compression on bigint data as we do for double
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/338
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/338
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/338
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/338
Thanks for working for this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/338
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/CARBONDATA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711514#comment-15711514
]
Jacky Li commented on CARBONDATA-458:
-
This work is not about merging footers into a central file, it is about
re-orgnaizing the internal structure of carbon file to make it faster when
doing the first time query. I think the biggest bottle net is the 3rd and 4th
of those Vishal has pointed out.
3. Carbon reading more footer data than its required(data chunk)
4. There are lots of random seek is happening in carbon as column data(data
page, rle, inverted index) are not stored together.
> Improving carbon first time query performance
> --
>
> Key: CARBONDATA-458
> URL: https://issues.apache.org/jira/browse/CARBONDATA-458
> Project: CarbonData
> Issue Type: Improvement
> Components: core, data-load, data-query
>Reporter: kumar vishal
>Assignee: kumar vishal
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> Improving carbon first time query performance
> Reason:
> 1. As file system cache is cleared file reading will make it slower to read
> and cache
> 2. In first time query carbon will have to read the footer from file data
> file to form the btree
> 3. Carbon reading more footer data than its required(data chunk)
> 4. There are lots of random seek is happening in carbon as column data(data
> page, rle, inverted index) are not stored together.
> Solution:
> 1. Improve block loading time. This can be done by removing data chunk from
> blockletInfo and storing only offset and length of data chunk
> 2. compress presence meta bitset stored for null values for measure column
> using snappy
> 3. Store the metadata and data of a column together and read together this
> reduces random seek and improve IO
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Github user jackylk commented on the issue:
https://github.com/apache/incubator-carbondata/pull/265
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user Zhangshunyu closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/376
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user kumarvishal09 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/265
http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/org.apache.carbondata$carbondata-spark/719/testReport/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
GitHub user Zhangshunyu opened a pull request:
https://github.com/apache/incubator-carbondata/pull/376
[WIP]TO support insert 1 line into carbon table.
WIP
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Zhangshunyu/incubator-carbondata insert1line
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/376.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #376
commit 43c363315ff980f02fe060ad0c25a4d028d463f7
Author: Zhangshunyu
Date: 2016-12-01T08:20:04Z
To support insert into one line
commit 600fc29e24c9766a63f239e543ca23ead53c235e
Author: Zhangshunyu
Date: 2016-12-01T08:20:59Z
To support insert into one line
commit 3b0b0bc16ff1ffe0e59d338ec94d36a3317e4a1e
Author: Zhangshunyu
Date: 2016-12-01T08:24:18Z
To support insert into one line
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/265#discussion_r90400739
--- Diff:
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -340,17 +341,17 @@ private Expression getFilterPredicates(Configuration
configuration) {
}
resultFilterredBlocks.addAll(filterredBlocks);
}
-statistic.addStatistics(QueryStatisticsConstants.LOAD_BLOCKS_DRIVER,
-System.currentTimeMillis());
+statistic
+.addStatistics(QueryStatisticsConstants.LOAD_BLOCKS_DRIVER,
System.currentTimeMillis());
--- End diff --
no need to
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/265#discussion_r90400632
--- Diff:
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -193,8 +203,7 @@ public static void setSegmentsToAccess(Configuration
configuration, List
* @return List list of CarbonInputSplit
* @throws IOException
*/
- @Override
- public List getSplits(JobContext job) throws IOException {
+ @Override public List getSplits(JobContext job) throws
IOException {
--- End diff --
move `Override` to previous line
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/265#discussion_r90399963
--- Diff:
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
---
@@ -814,85 +810,86 @@
* Rocord size in case of compaction.
*/
public static final int COMPACTION_INMEMORY_RECORD_SIZE = 12;
-
- /**
- * If the level 2 compaction is done in minor then new compacted segment
will end with .2
- */
- public static String LEVEL2_COMPACTION_INDEX = ".2";
-
- /**
- * Indicates compaction
- */
- public static String COMPACTION_KEY_WORD = "COMPACTION";
-
/**
* hdfs temporary directory key
*/
public static final String HDFS_TEMP_LOCATION = "hadoop.tmp.dir";
-
/**
* zookeeper url key
*/
public static final String ZOOKEEPER_URL = "spark.deploy.zookeeper.url";
-
/**
* configure the minimum blocklet size eligible for blocklet distribution
*/
public static final String CARBON_BLOCKLETDISTRIBUTION_MIN_REQUIRED_SIZE
=
"carbon.blockletdistribution.min.blocklet.size";
-
/**
* default blocklet size eligible for blocklet distribution
*/
public static final int
DEFAULT_CARBON_BLOCKLETDISTRIBUTION_MIN_REQUIRED_SIZE = 2;
-
+ /**
+ * This batch size is used to send rows from load step to another step
in batches.
+ */
+ public static final String DATA_LOAD_BATCH_SIZE = "DATA_LOAD_BATCH_SIZE";
+ /**
+ * Default size of data load batch size.
+ */
+ public static final String DATA_LOAD_BATCH_SIZE_DEFAULT = "1000";
+ /**
+ * carbon data file version property
+ */
+ public static final String CARBON_DATA_FILE_VERSION =
"carbon.data.file.version";
+ /**
+ * current data file version
+ */
+ public static final short CARBON_DATA_FILE_CURRENT_VERSION = 2;
--- End diff --
change 2 to enum also
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/338#discussion_r90398660
--- Diff:
core/src/main/java/org/apache/carbondata/core/util/ValueCompressionUtil.java ---
@@ -243,6 +261,20 @@ public static Object
getCompressedValues(COMPRESSION_TYPE compType, long[] value
}
}
+ /**
+ *
--- End diff --
please describe this function
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---