[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1297


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r138941702
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeDecimalColumnPage.java
 ---
@@ -0,0 +1,274 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page;
+
+import java.math.BigDecimal;
+
+import org.apache.carbondata.core.datastore.TableSpec;
+import org.apache.carbondata.core.memory.CarbonUnsafe;
+import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.memory.UnsafeMemoryManager;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.util.ByteUtil;
+
+/**
+ * Represents a columnar data for decimal data type column for one page
+ */
+public class UnsafeDecimalColumnPage extends DecimalColumnPage {
+
+  UnsafeDecimalColumnPage(TableSpec.ColumnSpec columnSpec, DataType 
dataType, int pageSize)
+  throws MemoryException {
+super(columnSpec, dataType, pageSize);
+capacity = (int) (pageSize * DEFAULT_ROW_SIZE * FACTOR);
+initMemory();
+  }
+
+  UnsafeDecimalColumnPage(TableSpec.ColumnSpec columnSpec, DataType 
dataType, int pageSize,
+  int capacity) throws MemoryException {
+super(columnSpec, dataType, pageSize);
+this.capacity = capacity;
+initMemory();
+  }
+
+  private void initMemory() throws MemoryException {
+switch (dataType) {
+  case BYTE:
+  case SHORT:
+  case INT:
+  case LONG:
+int size = pageSize << dataType.getSizeBits();
+memoryBlock = UnsafeMemoryManager.allocateMemoryWithRetry(taskId, 
size);
+baseAddress = memoryBlock.getBaseObject();
+baseOffset = memoryBlock.getBaseOffset();
+break;
+  case SHORT_INT:
+size = pageSize * 3;
+memoryBlock = UnsafeMemoryManager.allocateMemoryWithRetry(taskId, 
size);
+baseAddress = memoryBlock.getBaseObject();
+baseOffset = memoryBlock.getBaseOffset();
+break;
+  case DECIMAL:
+  case STRING:
--- End diff --

yes it will not be string..I will remove


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r138941633
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
 ---
@@ -22,21 +22,48 @@
 import java.util.List;
 
 import org.apache.carbondata.core.datastore.TableSpec;
+import org.apache.carbondata.core.memory.CarbonUnsafe;
+import org.apache.carbondata.core.memory.MemoryBlock;
 import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.memory.UnsafeMemoryManager;
 import org.apache.carbondata.core.metadata.datatype.DataType;
 import 
org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
 import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.core.util.ThreadLocalTaskInfo;
 
+import static org.apache.carbondata.core.metadata.datatype.DataType.BYTE;
 import static 
org.apache.carbondata.core.metadata.datatype.DataType.DECIMAL;
 
 public abstract class VarLengthColumnPageBase extends ColumnPage {
 
+  static final int byteBits = BYTE.getSizeBits();
+  static final int shortBits = DataType.SHORT.getSizeBits();
+  static final int intBits = DataType.INT.getSizeBits();
+  static final int longBits = DataType.LONG.getSizeBits();
+  // default size for each row, grows as needed
+  static final int DEFAULT_ROW_SIZE = 8;
+
+  static final double FACTOR = 1.25;
+
+  final long taskId = ThreadLocalTaskInfo.getCarbonTaskInfo().getTaskId();
+
+  // memory allocated by Unsafe
+  MemoryBlock memoryBlock;
--- End diff --

SafeDecimalColumnPage  will handle both fixed and variable implementation 
of decimal data based on precsion. So it is also required to be extended from 
VarLengthColumnPageBase


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-14 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r138941549
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
 ---
@@ -22,21 +22,48 @@
 import java.util.List;
 
 import org.apache.carbondata.core.datastore.TableSpec;
+import org.apache.carbondata.core.memory.CarbonUnsafe;
+import org.apache.carbondata.core.memory.MemoryBlock;
 import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.memory.UnsafeMemoryManager;
 import org.apache.carbondata.core.metadata.datatype.DataType;
 import 
org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
 import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.core.util.ThreadLocalTaskInfo;
 
+import static org.apache.carbondata.core.metadata.datatype.DataType.BYTE;
 import static 
org.apache.carbondata.core.metadata.datatype.DataType.DECIMAL;
 
 public abstract class VarLengthColumnPageBase extends ColumnPage {
 
+  static final int byteBits = BYTE.getSizeBits();
+  static final int shortBits = DataType.SHORT.getSizeBits();
+  static final int intBits = DataType.INT.getSizeBits();
+  static final int longBits = DataType.LONG.getSizeBits();
+  // default size for each row, grows as needed
+  static final int DEFAULT_ROW_SIZE = 8;
+
+  static final double FACTOR = 1.25;
+
+  final long taskId = ThreadLocalTaskInfo.getCarbonTaskInfo().getTaskId();
+
+  // memory allocated by Unsafe
+  MemoryBlock memoryBlock;
--- End diff --

SafeDecimalColumnPage  will handle both fixed and variable implementation 
of decimal data based on precsion. So it is also required to be extended from 
VarLengthColumnPageBase


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r138936085
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
 ---
@@ -22,21 +22,48 @@
 import java.util.List;
 
 import org.apache.carbondata.core.datastore.TableSpec;
+import org.apache.carbondata.core.memory.CarbonUnsafe;
+import org.apache.carbondata.core.memory.MemoryBlock;
 import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.memory.UnsafeMemoryManager;
 import org.apache.carbondata.core.metadata.datatype.DataType;
 import 
org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
 import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.core.util.ThreadLocalTaskInfo;
 
+import static org.apache.carbondata.core.metadata.datatype.DataType.BYTE;
 import static 
org.apache.carbondata.core.metadata.datatype.DataType.DECIMAL;
 
 public abstract class VarLengthColumnPageBase extends ColumnPage {
 
+  static final int byteBits = BYTE.getSizeBits();
+  static final int shortBits = DataType.SHORT.getSizeBits();
+  static final int intBits = DataType.INT.getSizeBits();
+  static final int longBits = DataType.LONG.getSizeBits();
+  // default size for each row, grows as needed
+  static final int DEFAULT_ROW_SIZE = 8;
+
+  static final double FACTOR = 1.25;
+
+  final long taskId = ThreadLocalTaskInfo.getCarbonTaskInfo().getTaskId();
+
+  // memory allocated by Unsafe
+  MemoryBlock memoryBlock;
--- End diff --

This is for Unsafe column page only, right


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r138931778
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeDecimalColumnPage.java
 ---
@@ -0,0 +1,274 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page;
+
+import java.math.BigDecimal;
+
+import org.apache.carbondata.core.datastore.TableSpec;
+import org.apache.carbondata.core.memory.CarbonUnsafe;
+import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.memory.UnsafeMemoryManager;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.util.ByteUtil;
+
+/**
+ * Represents a columnar data for decimal data type column for one page
+ */
+public class UnsafeDecimalColumnPage extends DecimalColumnPage {
+
+  UnsafeDecimalColumnPage(TableSpec.ColumnSpec columnSpec, DataType 
dataType, int pageSize)
+  throws MemoryException {
+super(columnSpec, dataType, pageSize);
+capacity = (int) (pageSize * DEFAULT_ROW_SIZE * FACTOR);
+initMemory();
+  }
+
+  UnsafeDecimalColumnPage(TableSpec.ColumnSpec columnSpec, DataType 
dataType, int pageSize,
+  int capacity) throws MemoryException {
+super(columnSpec, dataType, pageSize);
+this.capacity = capacity;
+initMemory();
+  }
+
+  private void initMemory() throws MemoryException {
+switch (dataType) {
+  case BYTE:
+  case SHORT:
+  case INT:
+  case LONG:
+int size = pageSize << dataType.getSizeBits();
+memoryBlock = UnsafeMemoryManager.allocateMemoryWithRetry(taskId, 
size);
+baseAddress = memoryBlock.getBaseObject();
+baseOffset = memoryBlock.getBaseOffset();
+break;
+  case SHORT_INT:
+size = pageSize * 3;
+memoryBlock = UnsafeMemoryManager.allocateMemoryWithRetry(taskId, 
size);
+baseAddress = memoryBlock.getBaseObject();
+baseOffset = memoryBlock.getBaseOffset();
+break;
+  case DECIMAL:
+  case STRING:
--- End diff --

It can not be String, right?


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r138916072
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/LazyColumnPage.java
 ---
@@ -91,9 +93,24 @@ public float getFloat(int rowId) {
 throw new UnsupportedOperationException("internal error");
   }
 
-  @Override
-  public BigDecimal getDecimal(int rowId) {
-return columnPage.getDecimal(rowId);
+  @Override public BigDecimal getDecimal(int rowId) {
--- End diff --

Can you move all @Override to previous line


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-14 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r138915973
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/DecimalColumnPage.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page;
+
+import org.apache.carbondata.core.datastore.TableSpec;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import 
org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
+
+/**
+ * Represent a columnar data in one page for one column of decimal data 
type
+ */
+public abstract class DecimalColumnPage extends VarLengthColumnPageBase {
+
+  /**
+   * decimal converter instance
+   */
+  DecimalConverterFactory.DecimalConverter decimalConverter;
+
+  DecimalColumnPage(TableSpec.ColumnSpec columnSpec, DataType dataType, 
int pageSize) {
+super(columnSpec, dataType, pageSize);
+decimalConverter = DecimalConverterFactory.INSTANCE
+.getDecimalConverter(columnSpec.getPrecision(), 
columnSpec.getScale());
+  }
+
+  public DecimalConverterFactory.DecimalConverter getDecimalConverter() {
+return decimalConverter;
+  }
+
+  @Override public byte[] getBytePage() {
+throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  }
+
+  @Override public short[] getShortPage() {
+throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  }
+
+  @Override public byte[] getShortIntPage() {
+throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  }
+
+  @Override public int[] getIntPage() {
+throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  }
+
+  @Override public long[] getLongPage() {
+throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  }
+
+  @Override public float[] getFloatPage() {
+throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  }
+
+  @Override public double[] getDoublePage() {
--- End diff --

Can you move all `@Override` to previous line


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-06 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r137273402
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java
 ---
@@ -165,9 +165,28 @@ public double getDouble(int rowId) {
 return doubleData[rowId];
   }
 
-  @Override
-  public BigDecimal getDecimal(int rowId) {
-throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  @Override public BigDecimal getDecimal(int rowId) {
--- End diff --

I think adding converter inside the column page make it a bit complex. I 
think there are two mays to make the code clean:
1. Store BigDecimal array in the ColumnPage and do the conversion to int or 
long when come to encoding part, by implementing a ColumnPageValueConverter
2. Create DecimalColumnPage class and add the conversion logic in 
DecimalColumnPage. This logic is the extra logic for decimal only. If you do 
like this, we can consider to rename `FixLengthColumnPage` to 
`PrimitiveColumnPage` and `VarLengthColumnPage` to `StringColumnPage` and the 
3rd one is `DecimalColumnPage`. The responsibility of each class will be more 
clear.


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-09-06 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r137270300
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingStrategy.java
 ---
@@ -104,18 +107,31 @@ private ColumnPageEncoder 
createEncoderForMeasure(ColumnPage columnPage) {
   case SHORT:
   case INT:
   case LONG:
-return 
selectCodecByAlgorithmForIntegral(stats).createEncoder(null);
+return selectCodecByAlgorithmForIntegral(stats,
+
DecimalConverterFactory.DecimalConverterType.DECIMAL_LONG).createEncoder(null);
+  case DECIMAL:
+return createEncoderForDecimalDataTypeMeasure(columnPage);
--- End diff --

Rename as others, `selectCodecByAlgorithmForDecimal`. 


---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-08-31 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r136260340
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java
 ---
@@ -165,9 +165,28 @@ public double getDouble(int rowId) {
 return doubleData[rowId];
   }
 
-  @Override
-  public BigDecimal getDecimal(int rowId) {
-throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  @Override public BigDecimal getDecimal(int rowId) {
--- End diff --

Please correct me If I am wrongFrom my opinion it will not be any 
advantage to add DecimalColumnPage as we are not doing anything extra if we 
introduce DecimalColumnPage.
Basically we are storing values either as fixed or variable length and 
decimal values also falls in these 2 categories. So according to current design 
I believe it may not be required to introduce 1 more type of page until and 
unless we have pages for all other datatypes separately.
Please share your suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] carbondata pull request #1297: [CARBONDATA-1429] Add a value based compressi...

2017-08-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1297#discussion_r136248243
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java
 ---
@@ -165,9 +165,28 @@ public double getDouble(int rowId) {
 return doubleData[rowId];
   }
 
-  @Override
-  public BigDecimal getDecimal(int rowId) {
-throw new UnsupportedOperationException("invalid data type: " + 
dataType);
+  @Override public BigDecimal getDecimal(int rowId) {
--- End diff --

I think it is better to add a DecimalColumnPage instead of using 
FixLengthColumnPage


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---