[jira] [Updated] (CARBONDATA-882) Add SORT_COLUMNS option support in dataframe writer
[ https://issues.apache.org/jira/browse/CARBONDATA-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-882: Description: User can should be able to specify SORT_COLUMNS option when using dataframe.write (was: User can specify Not to sort during loading, by adding an option in dataframe.write) > Add SORT_COLUMNS option support in dataframe writer > --- > > Key: CARBONDATA-882 > URL: https://issues.apache.org/jira/browse/CARBONDATA-882 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li > Fix For: 1.2.0-incubating > > > User can should be able to specify SORT_COLUMNS option when using > dataframe.write -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CARBONDATA-882) Add SORT_COLUMNS option support in dataframe writer
[ https://issues.apache.org/jira/browse/CARBONDATA-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-882: Summary: Add SORT_COLUMNS option support in dataframe writer (was: Add no sort support in dataframe writer) > Add SORT_COLUMNS option support in dataframe writer > --- > > Key: CARBONDATA-882 > URL: https://issues.apache.org/jira/browse/CARBONDATA-882 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li > Fix For: 1.2.0-incubating > > > User can specify Not to sort during loading, by adding an option in > dataframe.write -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-882) Add no sort support in dataframe writer
Jacky Li created CARBONDATA-882: --- Summary: Add no sort support in dataframe writer Key: CARBONDATA-882 URL: https://issues.apache.org/jira/browse/CARBONDATA-882 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Fix For: 1.2.0-incubating User can specify Not to sort during loading, by adding an option in dataframe.write -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-830) Incorrect schedule for NewCarbonDataLoadRDD
[ https://issues.apache.org/jira/browse/CARBONDATA-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-830. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Incorrect schedule for NewCarbonDataLoadRDD > --- > > Key: CARBONDATA-830 > URL: https://issues.apache.org/jira/browse/CARBONDATA-830 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.0.0-incubating > Environment: Spark 2.1 + Carbon 1.0.0 >Reporter: Weizhong >Assignee: Weizhong >Priority: Minor > Fix For: 1.1.0-incubating > > Time Spent: 1h > Remaining Estimate: 0h > > Currently NewCarbonDataLoadRDD's getPreferredLocations will return all locs > rather than 1, then on Spark may pick the same node for two tasks, so one > node is getting over loaded with the task and one has no task to do, and > impacting the performance despite of any failure. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-821) Remove Kettle related code and flow from carbon.
[ https://issues.apache.org/jira/browse/CARBONDATA-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-821. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Remove Kettle related code and flow from carbon. > > > Key: CARBONDATA-821 > URL: https://issues.apache.org/jira/browse/CARBONDATA-821 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Remove Kettle related code and flow from carbon. It becomes difficult to > developers to handle all bugs and features in both the flows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-832) Data loading is failing with duplicate header column in csv file
[ https://issues.apache.org/jira/browse/CARBONDATA-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-832. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Data loading is failing with duplicate header column in csv file > > > Key: CARBONDATA-832 > URL: https://issues.apache.org/jira/browse/CARBONDATA-832 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > Fix For: 1.1.0-incubating > > Time Spent: 20m > Remaining Estimate: 0h > > Problem : data mismatch issue when csv column having duplicate column header. > Solution: row parser impl logic of getting indexes is having issue -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-829) DICTIONARY_EXCLUDE is not working when using Spark Datasource DDL
Jacky Li created CARBONDATA-829: --- Summary: DICTIONARY_EXCLUDE is not working when using Spark Datasource DDL Key: CARBONDATA-829 URL: https://issues.apache.org/jira/browse/CARBONDATA-829 Project: CarbonData Issue Type: Bug Reporter: Jacky Li When creating table for TCP-H, found that following operation will fail create table car( L_SHIPDATE string, L_SHIPMODE string, L_SHIPINSTRUCT string, L_RETURNFLAG string, L_RECEIPTDATE string, L_ORDERKEY string, L_PARTKEY string, L_SUPPKEY string, L_LINENUMBER int, L_QUANTITY decimal, L_EXTENDEDPRICE decimal, L_DISCOUNT decimal, L_TAX decimal, L_LINESTATUS string, L_COMMITDATE string, L_COMMENT string ) USING org.apache.spark.sql.CarbonSource OPTIONS (tableName "car", DICTIONARY_EXCLUDE "L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_COMMENT"); -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-827) Query statistics log format is incorrect
Jacky Li created CARBONDATA-827: --- Summary: Query statistics log format is incorrect Key: CARBONDATA-827 URL: https://issues.apache.org/jira/browse/CARBONDATA-827 Project: CarbonData Issue Type: Bug Reporter: Jacky Li The output log for query statistics has repeated numbers which is incorrect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-696) NPE when select query run on measure having double data type without fraction.
[ https://issues.apache.org/jira/browse/CARBONDATA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-696. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > NPE when select query run on measure having double data type without > fraction. > --- > > Key: CARBONDATA-696 > URL: https://issues.apache.org/jira/browse/CARBONDATA-696 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.0.0-incubating >Reporter: Babulal >Assignee: Kunal Kapoor > Fix For: 1.1.0-incubating > > Attachments: logs, oscon_10.csv > > Time Spent: 1h > Remaining Estimate: 0h > > Create table as below > cc.sql("create table oscon_carbon_old (CUST_PRFRD_FLG String,PROD_BRAND_NAME > String,PROD_COLOR String,CUST_LAST_RVW_DATE String,CUST_COUNTRY > String,CUST_CITY String,PRODUCT_NAME String,CUST_JOB_TITLE String,CUST_STATE > String,CUST_BUY_POTENTIAL String,PRODUCT_MODEL String,ITM_ID String,ITM_NAME > String,PRMTION_ID String,PRMTION_NAME String,SHP_MODE_ID String,SHP_MODE > String,DELIVERY_COUNTRY String,DELIVERY_STATE String,DELIVERY_CITY > String,DELIVERY_DISTRICT String,ACTIVE_EMUI_VERSION String,WH_NAME > String,STR_ORDER_DATE String,OL_ORDER_NO String,OL_ORDER_DATE String,OL_SITE > String,CUST_FIRST_NAME String,CUST_LAST_NAME String,CUST_BIRTH_DY > String,CUST_BIRTH_MM String,CUST_BIRTH_YR String,CUST_BIRTH_COUNTRY > String,CUST_SEX String,CUST_ADDRESS_ID String,CUST_STREET_NO > String,CUST_STREET_NAME String,CUST_AGE String,CUST_SUITE_NO String,CUST_ZIP > String,CUST_COUNTY String,PRODUCT_ID String,PROD_SHELL_COLOR > String,DEVICE_NAME String,PROD_SHORT_DESC String,PROD_LONG_DESC > String,PROD_THUMB String,PROD_IMAGE String,PROD_UPDATE_DATE String,PROD_LIVE > String,PROD_LOC String,PROD_RAM String,PROD_ROM String,PROD_CPU_CLOCK > String,PROD_SERIES String,ITM_REC_START_DATE String,ITM_REC_END_DATE > String,ITM_BRAND_ID String,ITM_BRAND String,ITM_CLASS_ID String,ITM_CLASS > String,ITM_CATEGORY_ID String,ITM_CATEGORY String,ITM_MANUFACT_ID > String,ITM_MANUFACT String,ITM_FORMULATION String,ITM_COLOR > String,ITM_CONTAINER String,ITM_MANAGER_ID String,PRM_START_DATE > String,PRM_END_DATE String,PRM_CHANNEL_DMAIL String,PRM_CHANNEL_EMAIL > String,PRM_CHANNEL_CAT String,PRM_CHANNEL_TV String,PRM_CHANNEL_RADIO > String,PRM_CHANNEL_PRESS String,PRM_CHANNEL_EVENT String,PRM_CHANNEL_DEMO > String,PRM_CHANNEL_DETAILS String,PRM_PURPOSE String,PRM_DSCNT_ACTIVE > String,SHP_CODE String,SHP_CARRIER String,SHP_CONTRACT String,CHECK_DATE > String,CHECK_YR String,CHECK_MM String,CHECK_DY String,CHECK_HOUR String,BOM > String,INSIDE_NAME String,PACKING_DATE String,PACKING_YR String,PACKING_MM > String,PACKING_DY String,PACKING_HOUR String,DELIVERY_PROVINCE > String,PACKING_LIST_NO String,ACTIVE_CHECK_TIME String,ACTIVE_CHECK_YR > String,ACTIVE_CHECK_MM String,ACTIVE_CHECK_DY String,ACTIVE_CHECK_HOUR > String,ACTIVE_AREA_ID String,ACTIVE_COUNTRY String,ACTIVE_PROVINCE > String,ACTIVE_CITY String,ACTIVE_DISTRICT String,ACTIVE_NETWORK > String,ACTIVE_FIRMWARE_VER String,ACTIVE_OS_VERSION String,LATEST_CHECK_TIME > String,LATEST_CHECK_YR String,LATEST_CHECK_MM String,LATEST_CHECK_DY > String,LATEST_CHECK_HOUR String,LATEST_AREAID String,LATEST_COUNTRY > String,LATEST_PROVINCE String,LATEST_CITY String,LATEST_DISTRICT > String,LATEST_FIRMWARE_VER String,LATEST_EMUI_VERSION > String,LATEST_OS_VERSION String,LATEST_NETWORK String,WH_ID > String,WH_STREET_NO String,WH_STREET_NAME String,WH_STREET_TYPE > String,WH_SUITE_NO String,WH_CITY String,WH_COUNTY String,WH_STATE > String,WH_ZIP String,WH_COUNTRY String,OL_SITE_DESC String,OL_RET_ORDER_NO > String,OL_RET_DATE String,PROD_MODEL_ID String,CUST_ID String,PROD_UNQ_MDL_ID > String,CUST_NICK_NAME String,CUST_LOGIN String,CUST_EMAIL_ADDR > String,PROD_UNQ_DEVICE_ADDR String,PROD_UQ_UUID String,PROD_BAR_CODE > String,TRACKING_NO String,STR_ORDER_NO String,CUST_DEP_COUNT > double,CUST_VEHICLE_COUNT double,CUST_ADDRESS_CNT double,CUST_CRNT_CDEMO_CNT > double,CUST_CRNT_HDEMO_CNT double,CUST_CRNT_ADDR_DM > double,CUST_FIRST_SHIPTO_CNT double,CUST_FIRST_SALES_CNT > double,CUST_GMT_OFFSET double,CUST_DEMO_CNT double,CUST_INCOME > double,PROD_UNLIMITED double,PROD_OFF_PRICE double,PROD_UNITS > double,TOTAL_PRD_COST double,TOTAL_PRD_DISC double,PROD_WEIGHT > double,REG_UNIT_PRICE double,EXTENDED_AMT double,UNIT_PRICE_DSCNT_PCT > double,DSCNT_AMT double,PROD_STD_CST double,TOTAL_TX_AMT double,FREIGHT_CHRG > double,WAITING_PERIOD double,DELIVERY_PERIOD double,ITM_CRNT_PRICE > double,ITM_UNITS double,ITM_WSLE_CST double,ITM_SIZE double,PRM_CST > double,PRM_RESPONSE_TARGET double,PRM_ITM_DM double,SHP_MODE_CNT > double,WH_GMT_OFFSET
[jira] [Resolved] (CARBONDATA-818) The file_name stored in carbonindex is wrong
[ https://issues.apache.org/jira/browse/CARBONDATA-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-818. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > The file_name stored in carbonindex is wrong > > > Key: CARBONDATA-818 > URL: https://issues.apache.org/jira/browse/CARBONDATA-818 > Project: CarbonData > Issue Type: Bug >Reporter: Yadong Qi >Assignee: Yadong Qi > Fix For: 1.1.0-incubating > > Time Spent: 4h 40m > Remaining Estimate: 0h > > The file_name stored in carbonindex is a local path which used on executor as > temp dir > {code} > /tmp/6937581525189542/0/default/carbon_v3/Fact/Part0/Segment_0/0/part-0-0_batchno0-0-1490345609845.carbondata > {code} > But I think we want to store the actual carbondata path like > {code} > part-0-0_batchno0-0-1490345609845.carbondata > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-820) Redundant BitSet created in data load
[ https://issues.apache.org/jira/browse/CARBONDATA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-820. - Resolution: Fixed Assignee: Jacky Li > Redundant BitSet created in data load > - > > Key: CARBONDATA-820 > URL: https://issues.apache.org/jira/browse/CARBONDATA-820 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.0.0-incubating >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Minor > Fix For: 1.1.0-incubating > > Time Spent: 0.5h > Remaining Estimate: 0h > > In CarbonFactDataHandlerColumnar.getMeasureNullValueIndexBitSet method -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-823) Refactory of data write step
Jacky Li created CARBONDATA-823: --- Summary: Refactory of data write step Key: CARBONDATA-823 URL: https://issues.apache.org/jira/browse/CARBONDATA-823 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Fix For: 1.1.0-incubating -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-783) Loading data with Single Pass 'true' option is throwing an exception
[ https://issues.apache.org/jira/browse/CARBONDATA-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-783. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Loading data with Single Pass 'true' option is throwing an exception > > > Key: CARBONDATA-783 > URL: https://issues.apache.org/jira/browse/CARBONDATA-783 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.1.0-incubating > Environment: spark 2.1 >Reporter: Geetika Gupta >Assignee: Ravindra Pesala > Fix For: 1.1.0-incubating > > Attachments: 7000_UniqData.csv > > Time Spent: 1.5h > Remaining Estimate: 0h > > I tried to create table using the following query: > CREATE TABLE uniq_include_dictionary (CUST_ID int,CUST_NAME > String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, > BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), > DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 > double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,Double_COLUMN2,DECIMAL_COLUMN2'); > Table creation was successfull but when I tried to load data into the table > It showed the following error: > ERROR 16-03 13:41:32,354 - nioEventLoopGroup-8-2 > java.lang.IndexOutOfBoundsException: readerIndex(64) + length(25) exceeds > writerIndex(80): UnpooledUnsafeDirectByteBuf(ridx: 64, widx: 80, cap: 80) > at > io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1161) > at > io.netty.buffer.AbstractByteBuf.checkReadableBytes(AbstractByteBuf.java:1155) > at io.netty.buffer.AbstractByteBuf.readBytes(AbstractByteBuf.java:694) > at io.netty.buffer.AbstractByteBuf.readBytes(AbstractByteBuf.java:702) > at > org.apache.carbondata.core.dictionary.generator.key.DictionaryMessage.readData(DictionaryMessage.java:70) > at > org.apache.carbondata.core.dictionary.server.DictionaryServerHandler.channelRead(DictionaryServerHandler.java:59) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) > at java.lang.Thread.run(Thread.java:745) > ERROR 16-03 13:41:32,355 - nioEventLoopGroup-8-2 exceptionCaught > java.lang.IndexOutOfBoundsException: readerIndex(64) + length(25) exceeds > writerIndex(80): UnpooledUnsafeDirectByteBuf(ridx: 64, widx: 80, cap: 80) > at > io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1161) > at > io.netty.buffer.AbstractByteBuf.checkReadableBytes(AbstractByteBuf.java:1155) > at io.netty.buffer.AbstractByteBuf.readBytes(AbstractByteBuf.java:694) > at io.netty.buffer.AbstractByteBuf.readBytes(AbstractByteBuf.java:702) > at > org.apache.carbondata.core.dictionary.generator.key.DictionaryMessage.readData(DictionaryMessage.java:70) > at > org.apache.carbondata.core.dictionary.server.DictionaryServerHandler.channelRead(DictionaryServerHandler.java:59) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerCont
[jira] [Resolved] (CARBONDATA-809) Union with alias is returning wrong result.
[ https://issues.apache.org/jira/browse/CARBONDATA-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-809. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.1.0-incubating > Union with alias is returning wrong result. > --- > > Key: CARBONDATA-809 > URL: https://issues.apache.org/jira/browse/CARBONDATA-809 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 40m > Remaining Estimate: 0h > > Union with alias is returning wrong result. > Testcase > {code} > SELECT t.c1 a FROM (select c1 from carbon_table1 union all select c1 from > carbon_table2) t > {code} > The above query returns the data from only one table and also duplicated. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-812) make vectorized reader as default reader
[ https://issues.apache.org/jira/browse/CARBONDATA-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-812. - Resolution: Fixed Assignee: Jacky Li > make vectorized reader as default reader > > > Key: CARBONDATA-812 > URL: https://issues.apache.org/jira/browse/CARBONDATA-812 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Assignee: Jacky Li > Fix For: 1.1.0-incubating > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CARBONDATA-820) Redundant BitSet created in data load
[ https://issues.apache.org/jira/browse/CARBONDATA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-820: Request participants: (was: ) Description: In CarbonFactDataHandlerColumnar.getMeasureNullValueIndexBitSet method > Redundant BitSet created in data load > - > > Key: CARBONDATA-820 > URL: https://issues.apache.org/jira/browse/CARBONDATA-820 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.0.0-incubating >Reporter: Jacky Li >Priority: Minor > Fix For: 1.1.0-incubating > > > In CarbonFactDataHandlerColumnar.getMeasureNullValueIndexBitSet method -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-820) Redundant BitSet created in data load
Jacky Li created CARBONDATA-820: --- Summary: Redundant BitSet created in data load Key: CARBONDATA-820 URL: https://issues.apache.org/jira/browse/CARBONDATA-820 Project: CarbonData Issue Type: Bug Affects Versions: 1.0.0-incubating Reporter: Jacky Li Priority: Minor Fix For: 1.1.0-incubating -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-812) make vectorized reader as default reader
Jacky Li created CARBONDATA-812: --- Summary: make vectorized reader as default reader Key: CARBONDATA-812 URL: https://issues.apache.org/jira/browse/CARBONDATA-812 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Fix For: 1.1.0-incubating -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-742) Add batch sort to improve the loading performance
[ https://issues.apache.org/jira/browse/CARBONDATA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-742. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Add batch sort to improve the loading performance > - > > Key: CARBONDATA-742 > URL: https://issues.apache.org/jira/browse/CARBONDATA-742 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Current Problem: > Sort step is major issue as it is blocking step. It needs to receive all data > and write down the sort temp files to disk, after that only data writer step > can start. > Solution: > Make sort step as non blocking step so it avoids waiting of Data writer step. > Process the data in sort step in batches with size of in-memory capability of > the machine. For suppose if machine can allocate 4 GB to process data > in-memory, then Sort step can sorts the data with batch size of 2GB and gives > it to the data writer step. By the time data writer step consumes the data, > sort step receives and sorts the data. So here all steps are continuously > working and absolutely there is no disk IO in sort step. > So there would not be any waiting of data writer step for sort step, As and > when sort step sorts the data in memory data writer can start writing it. > It can significantly improves the performance. > Advantages: > Increases the loading performance as there is no intermediate IO and no > blocking of Sort step. > There is no extra effort for compaction, the current flow can handle it. > Disadvantages: > Number of driver side btrees will increase. So the memory might increase but > it could be controlled by current LRU cache implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-775) Update Documentation for Supported Datatypes
[ https://issues.apache.org/jira/browse/CARBONDATA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-775. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Update Documentation for Supported Datatypes > > > Key: CARBONDATA-775 > URL: https://issues.apache.org/jira/browse/CARBONDATA-775 > Project: CarbonData > Issue Type: Improvement > Components: docs >Reporter: Pallavi Singh >Assignee: Pallavi Singh > Fix For: 1.1.0-incubating > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-730) unsupported type: DecimalType
[ https://issues.apache.org/jira/browse/CARBONDATA-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-730. - Resolution: Fixed > unsupported type: DecimalType > - > > Key: CARBONDATA-730 > URL: https://issues.apache.org/jira/browse/CARBONDATA-730 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration >Affects Versions: 1.0.0-incubating > Environment: Spark 1.6.2 Hadoop 2.6 >Reporter: Sanoj MG >Assignee: anubhav tarar >Priority: Minor > Fix For: 1.1.0-incubating > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Below exception is thrown while trying to save dataframe with a decimal > column type. > scala> df.printSchema > |-- account: integer (nullable = true) > |-- currency: integer (nullable = true) > |-- branch: integer (nullable = true) > |-- country: integer (nullable = true) > |-- date: date (nullable = true) > |-- fcbalance: decimal(16,3) (nullable = true) > |-- lcbalance: decimal(16,3) (nullable = true) > scala> df.write.format("carbondata").option("tableName", > "accBal").option("compress", "true").mode(SaveMode.Overwrite).save() > java.lang.RuntimeException: unsupported type: DecimalType(16,3) > at scala.sys.package$.error(package.scala:27) > at > org.apache.carbondata.spark.CarbonDataFrameWriter.org$apache$carbondata$spark$CarbonDataFrameWriter$$convertToCarbonType(CarbonDataFrameWriter.scala:172) > at > org.apache.carbondata.spark.CarbonDataFrameWriter$$anonfun$2.apply(CarbonDataFrameWriter.scala:178) > at > org.apache.carbondata.spark.CarbonDataFrameWriter$$anonfun$2.apply(CarbonDataFrameWriter.scala:177) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > This is working fine with below change : > git diff > diff --git > a/integration/spark/src/main/scala/org/apache/carbondata/spark/CarbonDataFrameWriter.scala > > b/integration/spark/src/main/scala/org/apache/carbondata/spark/CarbonDataFrameWriter.scala > index b843f59..cf9a775 100644 > --- > a/integration/spark/src/main/scala/org/apache/carbondata/spark/CarbonDataFrameWriter.scala > +++ > b/integration/spark/src/main/scala/org/apache/carbondata/spark/CarbonDataFrameWriter.scala > @@ -169,6 +169,7 @@ class CarbonDataFrameWriter(val dataFrame: DataFrame) { >case BooleanType => CarbonType.DOUBLE.getName >case TimestampType => CarbonType.TIMESTAMP.getName >case DateType => CarbonType.DATE.getName > + case dt: DecimalType => > s"${CarbonType.DECIMAL.getName}(${dt.precision}, ${dt.scale})" >case other => sys.error(s"unsupported type: $other") > } >} > Can I create a pull request? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-769) Support Codegen in CarbonDictionaryDecoder
[ https://issues.apache.org/jira/browse/CARBONDATA-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-769. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Support Codegen in CarbonDictionaryDecoder > -- > > Key: CARBONDATA-769 > URL: https://issues.apache.org/jira/browse/CARBONDATA-769 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Support Codegen in CarbonDictionaryDecoder to leverage wholecodegen > performance of Spark2.1 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-762) modify all schemaName->databaseName, cubeName->tableName
[ https://issues.apache.org/jira/browse/CARBONDATA-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-762. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > modify all schemaName->databaseName, cubeName->tableName > > > Key: CARBONDATA-762 > URL: https://issues.apache.org/jira/browse/CARBONDATA-762 > Project: CarbonData > Issue Type: Bug >Reporter: QiangCai >Assignee: Cao, Lionel >Priority: Minor > Fix For: 1.1.0-incubating > > Time Spent: 1h > Remaining Estimate: 0h > > modify all schemaName->databaseName, cubeName->tableName -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-786) Data mismatch if the data data is loaded across blocklet groups
[ https://issues.apache.org/jira/browse/CARBONDATA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-786. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.1.0-incubating > Data mismatch if the data data is loaded across blocklet groups > --- > > Key: CARBONDATA-786 > URL: https://issues.apache.org/jira/browse/CARBONDATA-786 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Data mismatch if the data data is loaded across blocklet groups and filter > applied on second column onwards. > Follow testcase > {code} > CarbonProperties.getInstance() > .addProperty("carbon.blockletgroup.size.in.mb", "16") > .addProperty("carbon.enable.vector.reader", "true") > .addProperty("enable.unsafe.sort", "true") > val rdd = sqlContext.sparkContext > .parallelize(1 to 120, 4) > .map { x => > ("city" + x % 8, "country" + x % 1103, "planet" + x % 10007, > x.toString, > (x % 16).toShort, x / 2, (x << 1).toLong, x.toDouble / 13, > x.toDouble / 11) > }.map { x => > Row(x._1, x._2, x._3, x._4, x._5, x._6, x._7, x._8, x._9) > } > val schema = StructType( > Seq( > StructField("city", StringType, nullable = false), > StructField("country", StringType, nullable = false), > StructField("planet", StringType, nullable = false), > StructField("id", StringType, nullable = false), > StructField("m1", ShortType, nullable = false), > StructField("m2", IntegerType, nullable = false), > StructField("m3", LongType, nullable = false), > StructField("m4", DoubleType, nullable = false), > StructField("m5", DoubleType, nullable = false) > ) > ) > val input = sqlContext.createDataFrame(rdd, schema) > sql(s"drop table if exists testBigData") > input.write > .format("carbondata") > .option("tableName", "testBigData") > .option("tempCSV", "false") > .option("single_pass", "true") > .option("dictionary_exclude", "id") // id is high cardinality column > .mode(SaveMode.Overwrite) > .save() > sql(s"select city, sum(m1) from testBigData " + > s"where country='country12' group by city order by city").show() > {code} > The above code supposed return following data, but not returning it. > {code} > +-+---+ > | city|sum(m1)| > +-+---+ > |city0|544| > |city1|680| > |city2|816| > |city3|952| > |city4| 1088| > |city5| 1224| > |city6| 1360| > |city7| 1496| > +-+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-753) Fix Date and Timestamp format issues
[ https://issues.apache.org/jira/browse/CARBONDATA-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-753. - Resolution: Fixed > Fix Date and Timestamp format issues > > > Key: CARBONDATA-753 > URL: https://issues.apache.org/jira/browse/CARBONDATA-753 > Project: CarbonData > Issue Type: Bug > Components: core, examples >Affects Versions: 1.0.0-incubating >Reporter: Liang Chen >Assignee: Liang Chen >Priority: Minor > Fix For: 1.1.0-incubating, 1.0.1-incubating > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Fix Date and Timestamp format issues: > 1.Optimize the description of CARBON_TIMESTAMP_FORMAT&CARBON_DATE_FORMAT in > CarbonCommonConstants.java > 2.Correct fields' definition of Date and Timestamp in examples. > 3.Add example script how to show raw data's timestamp format. currently > spark.sql.show() by default using "-mm-dd hh:mm:ss.f" as > Timestamp.toString() format, users always wanting the show data same as raw > data format. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-756) RLE encoding isse
[ https://issues.apache.org/jira/browse/CARBONDATA-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-756. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > RLE encoding isse > - > > Key: CARBONDATA-756 > URL: https://issues.apache.org/jira/browse/CARBONDATA-756 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > Fix For: 1.1.0-incubating > > Time Spent: 3h > Remaining Estimate: 0h > > Problem: Rle index size is more than actual data size > Solution : If rle index size is more than data size or it is more than 70 of > the data size then disable rle encoding for that column -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-751) Adding Header and making footer optional
[ https://issues.apache.org/jira/browse/CARBONDATA-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-751. - Resolution: Fixed Assignee: kumar vishal Fix Version/s: 1.1.0-incubating > Adding Header and making footer optional > > > Key: CARBONDATA-751 > URL: https://issues.apache.org/jira/browse/CARBONDATA-751 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > Fix For: 1.1.0-incubating > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Currently carbon does not support appendable format, so below changes is to > support appendable format in V3 data file format by making footer option and > added header in V3 carbon data file . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-736) Dictionary Loading issue in Decoder
[ https://issues.apache.org/jira/browse/CARBONDATA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-736. - Resolution: Fixed Assignee: kumar vishal Fix Version/s: 1.1.0-incubating > Dictionary Loading issue in Decoder > --- > > Key: CARBONDATA-736 > URL: https://issues.apache.org/jira/browse/CARBONDATA-736 > Project: CarbonData > Issue Type: Bug >Reporter: kumar vishal >Assignee: kumar vishal > Fix For: 1.1.0-incubating > > Time Spent: 40m > Remaining Estimate: 0h > > Problem: > Currently in Carbon dictionary decoder it is loading the dictionary files, it > is using get api, when number of columns are high it can use getAll api to > load dictionary data concurrently > Solution: > Use get All API -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-747) Add simple performance test for spark2.1 carbon integration
Jacky Li created CARBONDATA-747: --- Summary: Add simple performance test for spark2.1 carbon integration Key: CARBONDATA-747 URL: https://issues.apache.org/jira/browse/CARBONDATA-747 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Fix For: 1.1.0-incubating -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-746) Support spark-sql CLI for spark2.1 carbon integration
Jacky Li created CARBONDATA-746: --- Summary: Support spark-sql CLI for spark2.1 carbon integration Key: CARBONDATA-746 URL: https://issues.apache.org/jira/browse/CARBONDATA-746 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Assignee: Jacky Li Fix For: 1.1.0-incubating -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-715) Optimize Single pass data load
[ https://issues.apache.org/jira/browse/CARBONDATA-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-715. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Optimize Single pass data load > -- > > Key: CARBONDATA-715 > URL: https://issues.apache.org/jira/browse/CARBONDATA-715 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 5h 50m > Remaining Estimate: 0h > > 1. Upgrade to latest netty-4.1.8 > 2. Optimize the serialization of key for passing in network. > 3. Launch individual dictionary client for each loading thread. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-726) Update with V3 format for better IO and processing optimization.
[ https://issues.apache.org/jira/browse/CARBONDATA-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-726. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Update with V3 format for better IO and processing optimization. > > > Key: CARBONDATA-726 > URL: https://issues.apache.org/jira/browse/CARBONDATA-726 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Problems in current format. > 1. IO read is slower since it needs to go for multiple seeks on the file to > read column blocklets. Current size of blocklet is 12, so it needs to > read multiple times from file to scan the data on that column. Alternatively > we can increase the blocklet size but it suffers for filter queries as it > gets big blocklet to filter. > 2. Decompression is slower in current format, we are using inverted index for > faster filter queries and using NumberCompressor to compress the inverted > index in bit wise packing. It becomes slower so we should avoid number > compressor. One alternative is to keep blocklet size with in 32000 so that > inverted index can be written with short, but IO read suffers a lot. > To overcome from above 2 issues we are introducing new format V3. > Here each blocklet has multiple pages with size 32000, number of pages in > blocklet is configurable. Since we keep the page with in short limit so no > need compress the inverted index here. > And maintain the max/min for each page to further prune the filter queries. > Read the blocklet with pages at once and keep in offheap memory. > During filter first check the max/min range and if it is valid then go for > decompressing the page to filter further. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-692) Support scalar subquery in carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-692. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Support scalar subquery in carbon > - > > Key: CARBONDATA-692 > URL: https://issues.apache.org/jira/browse/CARBONDATA-692 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Reporter: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 0.5h > Remaining Estimate: 0h > > Carbon cannot run scalar sub queries like below > {code} > select sum(salary) from scalarsubquery t1 > where ID < (select sum(ID) from scalarsubquery t2 where t1.name = t2.name > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-705) Make the partition distribution as configurable and keep spark distribution as default
[ https://issues.apache.org/jira/browse/CARBONDATA-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-705. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Make the partition distribution as configurable and keep spark distribution > as default > -- > > Key: CARBONDATA-705 > URL: https://issues.apache.org/jira/browse/CARBONDATA-705 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.1.0-incubating > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Make the partition distribution as configurable and keep spark distribution > as default. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-325) Create table with columns contains spaces in name.
[ https://issues.apache.org/jira/browse/CARBONDATA-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-325. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Create table with columns contains spaces in name. > -- > > Key: CARBONDATA-325 > URL: https://issues.apache.org/jira/browse/CARBONDATA-325 > Project: CarbonData > Issue Type: Bug >Reporter: Harmeet Singh >Assignee: Harmeet Singh > Fix For: 1.1.0-incubating > > Time Spent: 3h 20m > Remaining Estimate: 0h > > I want to create table, using columns that contains spaces. I am using Thrift > Server and Beeline client for accessing carbon data. Whenever i am trying to > create a table, and their columns name contains spaces i am getting an error. > Below are the steps: > Step 1: > create table three (`first name` string, `age` int) stored by 'carbondata'; > Whenever i am executing above query, i am getting below error: > Error: org.apache.carbondata.spark.exception.MalformedCarbonCommandException: > Unsupported data type : FieldSchema(name:first name, type:string, > comment:null).getType (state=,code=0) > The above error is pretending to be wrong data types are using. > If I am removing `stored by 'carbondata'` from query, then this will work > fine because it is run on Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-685) Able to create table with spaces using carbon source
[ https://issues.apache.org/jira/browse/CARBONDATA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-685. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Able to create table with spaces using carbon source > > > Key: CARBONDATA-685 > URL: https://issues.apache.org/jira/browse/CARBONDATA-685 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.0.0-incubating > Environment: spark 2.1 single node cluster >Reporter: anubhav tarar >Assignee: Rahul Kumar >Priority: Trivial > Fix For: 1.1.0-incubating > > > when using carbon source i am able to create table with spaces > logs > 0: jdbc:hive2://localhost:1> CREATE TABLE table (ID Int, date Timestamp, > country String, name String, phonetype String, serialname String,salary > Int) USING org.apache.spark.sql.CarbonSource OPTIONS("tableName"="t a b l e > "); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows select > here table with empty spaces is created in hdfs > it should not allow this -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-690) Carbon data load fails with default option for USE_KETTLE(False)
[ https://issues.apache.org/jira/browse/CARBONDATA-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-690. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Carbon data load fails with default option for USE_KETTLE(False) > > > Key: CARBONDATA-690 > URL: https://issues.apache.org/jira/browse/CARBONDATA-690 > Project: CarbonData > Issue Type: Bug > Environment: Spark 2.1 >Reporter: Ramakrishna >Priority: Minor > Fix For: 1.1.0-incubating > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When load query is run with default option for USE_KETTLE, it fails at mdkey > generation. > sample query and issue: > LOAD DATA inpath > 'hdfs://hacluster/user/OSCON/sparkhive/warehouse/communication.db/flow_text_1/20140113_0_120.csv' > into table flow_carbon options('USE_KETTLE'='FALSE', 'DELIMITER'=',', > 'QUOTECHAR'='"','FILEHEADER'='aco_ac,ac_dte,txn_cnt,jrn_par,mfm_jrn_no,cbn_jrn_no,ibs_jrn_no,vch_no,vch_seq,srv_cde,cus_no,bus_cd_no,id_flg,cus_ac,bv_cde,bv_no,txn_dte,txn_time,txn_tlr,txn_bk,txn_br,ety_tlr,ety_bk,ety_br,bus_pss_no,chk_flg,chk_tlr,chk_jrn_no,bus_sys_no,bus_opr_cde,txn_sub_cde,fin_bus_cde,fin_bus_sub_cde,opt_prd_cde,chl,tml_id,sus_no,sus_seq,cho_seq,itm_itm,itm_sub,itm_sss,dc_flg,amt,bal,ccy,spv_flg,vch_vld_dte,pst_bk,pst_br,ec_flg,aco_tlr,opp_ac,opp_ac_nme,opp_bk,gen_flg,his_rec_sum_flg,his_flg,vch_typ,val_dte,opp_ac_flg,cmb_flg,ass_vch_flg,cus_pps_flg,bus_rmk_cde,vch_bus_rmk,tec_rmk_cde,vch_tec_rmk,rsv_ara,own_br,own_bk,gems_last_upd_d,gems_last_upd_d_bat,maps_date,maps_job,dt'); > Error: java.lang.Exception: DataLoad failure: There is an unexpected error: > unable to generate the mdkey (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-222) Query issue for all dimensions are no dictionary columns
[ https://issues.apache.org/jira/browse/CARBONDATA-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-222. - Resolution: Fixed Assignee: Gin-zhj > Query issue for all dimensions are no dictionary columns > > > Key: CARBONDATA-222 > URL: https://issues.apache.org/jira/browse/CARBONDATA-222 > Project: CarbonData > Issue Type: Bug >Reporter: Gin-zhj >Assignee: Gin-zhj >Priority: Minor > Fix For: 0.1.1-incubating, 0.2.0-incubating > > > step 1: > CREATE TABLE uniqdata_no (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_EXCLUDE'='CUST_NAME,ACTIVE_EMUI_VERSION'); > step 2: > LOAD DATA INPATH 'D:/download/3lakh_3.csv' into table uniqdata_no > OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > step 3: > select * from uniqdata_no limit 5; > the fact file is: > ,,,0 > query failed, catch exception: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 > at > org.apache.carbondata.core.util.ByteUtil$UnsafeComparer.compareTo(ByteUtil.java:197) > at > org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder.compareIndexes(BTreeDataRefNodeFinder.java:243) > at > org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder.findFirstLeafNode(BTreeDataRefNodeFinder.java:121) > at > org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder.findFirstDataBlock(BTreeDataRefNodeFinder.java:80) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getDataBlocksOfIndex(CarbonInputFormat.java:546) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:473) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:342) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getSplitsNonFilter(CarbonInputFormat.java:304) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:277) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (CARBONDATA-222) Query issue for all dimensions are no dictionary columns
[ https://issues.apache.org/jira/browse/CARBONDATA-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li reopened CARBONDATA-222: - Assignee: (was: Gin-zhj) > Query issue for all dimensions are no dictionary columns > > > Key: CARBONDATA-222 > URL: https://issues.apache.org/jira/browse/CARBONDATA-222 > Project: CarbonData > Issue Type: Bug >Reporter: Gin-zhj >Priority: Minor > Fix For: 0.2.0-incubating, 0.1.1-incubating > > > step 1: > CREATE TABLE uniqdata_no (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('DICTIONARY_EXCLUDE'='CUST_NAME,ACTIVE_EMUI_VERSION'); > step 2: > LOAD DATA INPATH 'D:/download/3lakh_3.csv' into table uniqdata_no > OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > step 3: > select * from uniqdata_no limit 5; > the fact file is: > ,,,0 > query failed, catch exception: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 4 > at > org.apache.carbondata.core.util.ByteUtil$UnsafeComparer.compareTo(ByteUtil.java:197) > at > org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder.compareIndexes(BTreeDataRefNodeFinder.java:243) > at > org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder.findFirstLeafNode(BTreeDataRefNodeFinder.java:121) > at > org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder.findFirstDataBlock(BTreeDataRefNodeFinder.java:80) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getDataBlocksOfIndex(CarbonInputFormat.java:546) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:473) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:342) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getSplitsNonFilter(CarbonInputFormat.java:304) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:277) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CARBONDATA-681) CSVReader related code improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-681. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > CSVReader related code improvement > -- > > Key: CARBONDATA-681 > URL: https://issues.apache.org/jira/browse/CARBONDATA-681 > Project: CarbonData > Issue Type: Sub-task > Components: hadoop-integration >Reporter: Jihong MA >Assignee: Jihong MA > Fix For: 1.1.0-incubating > > Time Spent: 1h 20m > Remaining Estimate: 0h > > refactoring csv reader support during data loading, as well as replacing > relevant class out of Carbon Hadoop component into data loading component > (processing) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] (CARBONDATA-683) Reduce test time
Title: Message Title Jacky Li created an issue CarbonData / CARBONDATA-683 Reduce test time Issue Type: Improvement Affects Versions: 1.0.0-incubating Assignee: Unassigned Created: 29/Jan/17 10:09 Priority: Major Reporter: Jacky Li Reduce test time by: 1. remove all unnecessary print 2. make sample csv file size smaller 3. change logger.audit to initialize strings in constructor Add Comment This messa
[jira] [Resolved] (CARBONDATA-680) Add stats like rows processed in each step. And also fix unsafe sort enable issue.
[ https://issues.apache.org/jira/browse/CARBONDATA-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-680. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.1.0-incubating > Add stats like rows processed in each step. And also fix unsafe sort enable > issue. > -- > > Key: CARBONDATA-680 > URL: https://issues.apache.org/jira/browse/CARBONDATA-680 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala >Priority: Minor > Fix For: 1.1.0-incubating > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Currently stats like number of rows processed in each step is not added in no > kettle flow. Please add the same. > And also unsafe sort is not enabling even though user enable the sort in > property file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-682) Fix license header for FloatDataTypeTestCase.scala and DateTypeTest.scala
[ https://issues.apache.org/jira/browse/CARBONDATA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-682: Fix Version/s: (was: 1.0.0-incubating) 1.1.0-incubating > Fix license header for FloatDataTypeTestCase.scala and DateTypeTest.scala > - > > Key: CARBONDATA-682 > URL: https://issues.apache.org/jira/browse/CARBONDATA-682 > Project: CarbonData > Issue Type: Bug >Reporter: Liang Chen >Assignee: Liang Chen >Priority: Minor > Fix For: 1.1.0-incubating > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Fix license header for FloatDataTypeTestCase.scala and DateTypeTest.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-682) Fix license header for FloatDataTypeTestCase.scala and DateTypeTest.scala
[ https://issues.apache.org/jira/browse/CARBONDATA-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-682. - Resolution: Fixed Fix Version/s: (was: 1.1.0-incubating) 1.0.0-incubating > Fix license header for FloatDataTypeTestCase.scala and DateTypeTest.scala > - > > Key: CARBONDATA-682 > URL: https://issues.apache.org/jira/browse/CARBONDATA-682 > Project: CarbonData > Issue Type: Bug >Reporter: Liang Chen >Assignee: Liang Chen >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Fix license header for FloatDataTypeTestCase.scala and DateTypeTest.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-659) Should add WhitespaceAround and ParenPad to javastyle
[ https://issues.apache.org/jira/browse/CARBONDATA-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-659. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Should add WhitespaceAround and ParenPad to javastyle > - > > Key: CARBONDATA-659 > URL: https://issues.apache.org/jira/browse/CARBONDATA-659 > Project: CarbonData > Issue Type: Improvement >Reporter: QiangCai >Assignee: QiangCai >Priority: Trivial > Fix For: 1.1.0-incubating > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-676) Code clean
[ https://issues.apache.org/jira/browse/CARBONDATA-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-676. - Resolution: Fixed Fix Version/s: 1.1.0-incubating > Code clean > -- > > Key: CARBONDATA-676 > URL: https://issues.apache.org/jira/browse/CARBONDATA-676 > Project: CarbonData > Issue Type: Improvement >Reporter: zhangshunyu >Assignee: zhangshunyu >Priority: Minor > Fix For: 1.1.0-incubating > > Time Spent: 20m > Remaining Estimate: 0h > > To clean some code: > Correct the spelling mistake > Remove unused function > Iterate the Array instead of transform it to List. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-655) Make nokettle dataload flow as default in carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-655. - Resolution: Fixed Fix Version/s: 1.0.0-incubating > Make nokettle dataload flow as default in carbon > > > Key: CARBONDATA-655 > URL: https://issues.apache.org/jira/browse/CARBONDATA-655 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Make nokettle dataload flow as default in carbon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (CARBONDATA-531) Eliminate spark dependency in carbon core
[ https://issues.apache.org/jira/browse/CARBONDATA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li closed CARBONDATA-531. --- Resolution: Invalid Because the code base has changed a lot, this improvement will be consider later > Eliminate spark dependency in carbon core > - > > Key: CARBONDATA-531 > URL: https://issues.apache.org/jira/browse/CARBONDATA-531 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 0.2.0-incubating >Reporter: Jacky Li >Assignee: Jacky Li > Fix For: 1.0.0-incubating > > Time Spent: 1h > Remaining Estimate: 0h > > Clean up the interface and take out Spark dependency on Carbon-core module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-617) Insert query not working with UNION
[ https://issues.apache.org/jira/browse/CARBONDATA-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-617. - Resolution: Fixed Fix Version/s: 1.0.0-incubating > Insert query not working with UNION > --- > > Key: CARBONDATA-617 > URL: https://issues.apache.org/jira/browse/CARBONDATA-617 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.0.0-incubating > Environment: Spark 1.6 > Hadoop 2.6 >Reporter: Deepti Bhardwaj >Assignee: QiangCai >Priority: Minor > Fix For: 1.0.0-incubating > > Attachments: 2000_UniqData.csv, > thrift-error-log-during-insert-with-union > > Time Spent: 3h 20m > Remaining Estimate: 0h > > I created 3 table all having same schema > Create table commands: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > CREATE TABLE student (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > CREATE TABLE department (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > and I loaded the uniqdata and department table with the attached > CSV(2000_UniqData.csv) > and the insert query used to load data in student table was: > insert into student select > CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1 > from uniqdata UNION select > CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1 > from department; > When I try to insert data into student with union operation, it gives > java.lang.Exception: DataLoad failure.(attached below) > The Union query works well when used alone but when insert is used with Union > it fails. > Also, if I used hive tables instead of carbon tables insert does not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-638) Move package in carbon-core module
Jacky Li created CARBONDATA-638: --- Summary: Move package in carbon-core module Key: CARBONDATA-638 URL: https://issues.apache.org/jira/browse/CARBONDATA-638 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Assignee: Jacky Li Fix For: 1.0.0-incubating move org.apache.carbondata.core.carbon to org.apache.carbondata.core move org.apache.carbondata.common.ext to org.apache.carbondata.core.service move org.apache.carbondata.common.iudprocessor.iuddata to org.apache.carbondata.core.update move org.apache.carbondata.core.partition to org.apache.carbondata.processing move org.apache.carbondata.fileoperation to org.apache.carbondata.core.fileoperation move org.apache.carbondata.locks to org.apache.carbondata.core.locks move CarbonDataLoadSchema to carbon-processing move all Ideintifier class to org.apaceh.carbondata.core.metadata move org.apache.carbondata.core.datastorage to org.apache.carbondata.core.datastore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-637) Remove table_status file
Jacky Li created CARBONDATA-637: --- Summary: Remove table_status file Key: CARBONDATA-637 URL: https://issues.apache.org/jira/browse/CARBONDATA-637 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Assignee: Jacky Li Fix For: 1.0.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-622) Should use the same fileheader reader for dict generation and data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-622. - Resolution: Fixed > Should use the same fileheader reader for dict generation and data loading > -- > > Key: CARBONDATA-622 > URL: https://issues.apache.org/jira/browse/CARBONDATA-622 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.0.0-incubating >Reporter: QiangCai >Assignee: QiangCai >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 3h > Remaining Estimate: 0h > > We can get file header from DDL command and CSV file. > 1. If the file header comes from DDL command, separate this file header by > comma "," > 2. if the file header comes from CSV file, sparate this file header by > specify delimiter in DDL command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-607) Cleanup ValueCompressionHolder class and all sub-classes
[ https://issues.apache.org/jira/browse/CARBONDATA-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-607. - Resolution: Fixed Fix Version/s: 1.0.0-incubating > Cleanup ValueCompressionHolder class and all sub-classes > > > Key: CARBONDATA-607 > URL: https://issues.apache.org/jira/browse/CARBONDATA-607 > Project: CarbonData > Issue Type: Sub-task > Components: core >Reporter: Jihong MA >Assignee: Jihong MA > Fix For: 1.0.0-incubating > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Rewrite ValueCompressionHolder class as a base class for compressing or > uncompressing numeric data for measurement column chunk. > refactor all sub-classes under > org.apache.carbondata.core.datastorage.store.compression.decimal.* > org.apache.carbondata.core.datastorage.store.compression.nonDecimal.* > org.apache.carbondata.core.datastorage.store.compression.none.* > org.apache.carbondata.core.datastorage.store.compression.type.* > as part of the work, also fix a performance bug to avoid creating unnecessary > compression/uncompression value holder during compression or decompression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-616) Remove the duplicated class CarbonDataWriterException.java
[ https://issues.apache.org/jira/browse/CARBONDATA-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-616. - Resolution: Fixed > Remove the duplicated class CarbonDataWriterException.java > -- > > Key: CARBONDATA-616 > URL: https://issues.apache.org/jira/browse/CARBONDATA-616 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 1.0.0-incubating >Reporter: Liang Chen >Assignee: Liang Chen >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 0.5h > Remaining Estimate: 0h > > Remove the duplicated class CarbonDataWriterException.java [1] > [1]org.apache.carbondata.core.writer.exception.CarbonDataWriterException.java > [2]org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException.java > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-595) Drop Table for carbon throws NPE with HDFS lock type.
[ https://issues.apache.org/jira/browse/CARBONDATA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-595. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.0.0-incubating > Drop Table for carbon throws NPE with HDFS lock type. > - > > Key: CARBONDATA-595 > URL: https://issues.apache.org/jira/browse/CARBONDATA-595 > Project: CarbonData > Issue Type: Bug >Affects Versions: 0.2.0-incubating >Reporter: Babulal >Assignee: Ravindra Pesala >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Start version :- 1.6.2 > Start carbon thrift server > set HDFS LOCK Type > drop table from beeline > 0: jdbc:hive2://hacluster> drop table oscon_new_1; > Error: java.lang.NullPointerException (state=,code=0) > Error in thrftserver > 17/01/04 20:40:08 AUDIT DropTableCommand: > [hadoop-master][anonymous][Thread-182]Deleted table [oscon_new_1] under > database [default] > 17/01/04 20:40:08 ERROR AbstractDFSCarbonFile: pool-25-thread-12 Exception > occured:File does not exist: > hdfs://hacluster/opt/CarbonStore/default/oscon_new_1/droptable.lock > 17/01/04 20:40:08 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.NullPointerException > at > org.apache.carbondata.core.datastorage.store.filesystem.AbstractDFSCarbonFile.delete(AbstractDFSCarbonFile.java:128) > at > org.apache.carbondata.lcm.locks.HdfsFileLock.unlock(HdfsFileLock.java:110) > at > org.apache.spark.sql.execution.command.DropTableCommand.run(carbonTableSchema.scala:613) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > Note :- lock file and data are deleted successfully but in beeline it throws > ERROR message instead of success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-608) Compliation Error with spark 1.6 profile
[ https://issues.apache.org/jira/browse/CARBONDATA-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-608. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.0.0-incubating > Compliation Error with spark 1.6 profile > > > Key: CARBONDATA-608 > URL: https://issues.apache.org/jira/browse/CARBONDATA-608 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Reporter: Prabhat Kashyap >Assignee: Ravindra Pesala >Priority: Critical > Fix For: 1.0.0-incubating > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-606) Add a Flink example to read CarbonData files
Jacky Li created CARBONDATA-606: --- Summary: Add a Flink example to read CarbonData files Key: CARBONDATA-606 URL: https://issues.apache.org/jira/browse/CARBONDATA-606 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Assignee: Jacky Li Fix For: 1.0.0-incubating Add a Flink example to read CarbonData files written by Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-572) clean up code for carbon-spark-common module
[ https://issues.apache.org/jira/browse/CARBONDATA-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-572. - Resolution: Fixed Assignee: Jacky Li > clean up code for carbon-spark-common module > > > Key: CARBONDATA-572 > URL: https://issues.apache.org/jira/browse/CARBONDATA-572 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Assignee: Jacky Li > Fix For: 1.0.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-218) Remove Dependency: spark-csv and Unify CSV Reader for dataloading
[ https://issues.apache.org/jira/browse/CARBONDATA-218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-218. - Resolution: Fixed > Remove Dependency: spark-csv and Unify CSV Reader for dataloading > - > > Key: CARBONDATA-218 > URL: https://issues.apache.org/jira/browse/CARBONDATA-218 > Project: CarbonData > Issue Type: Improvement >Reporter: QiangCai >Assignee: QiangCai >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-401) Look forward to support reading csv file only once in data loading
[ https://issues.apache.org/jira/browse/CARBONDATA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-401. - Resolution: Fixed Fix Version/s: 1.0.0-incubating > Look forward to support reading csv file only once in data loading > --- > > Key: CARBONDATA-401 > URL: https://issues.apache.org/jira/browse/CARBONDATA-401 > Project: CarbonData > Issue Type: Improvement >Reporter: Lionx >Assignee: Lionx > Fix For: 1.0.0-incubating > > Time Spent: 12h 20m > Remaining Estimate: 0h > > Now, In Carbon data loading module, generating global dictionary is > independent. Carbon read the csv file twice for generating global dictionary > and loading carbon data, respectively. We look forward to read the csv file > only once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-558) Load performance bad when use_kettle=false
[ https://issues.apache.org/jira/browse/CARBONDATA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-558. - Resolution: Fixed Fix Version/s: 1.0.0-incubating > Load performance bad when use_kettle=false > -- > > Key: CARBONDATA-558 > URL: https://issues.apache.org/jira/browse/CARBONDATA-558 > Project: CarbonData > Issue Type: Bug >Reporter: Gin-zhj >Assignee: Gin-zhj > Fix For: 1.0.0-incubating > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When you import a data file, the measure column contains many empty strings, > if use_kettle=false, the load performance has a sharp decline > I checked the logs of executor, many warnnings printed like below: > 16/12/22 07:03:12 WARN MeasureFieldConverterImpl: pool-22-thread-6 Cant not > convert : to Numeric type value. Value considered as null. > 16/12/22 07:03:12 WARN MeasureFieldConverterImpl: pool-22-thread-1 Cant not > convert : to Numeric type value. Value considered as null. > 16/12/22 07:03:12 WARN MeasureFieldConverterImpl: pool-22-thread-6 Cant not > convert : to Numeric type value. Value considered as null. > 16/12/22 07:03:12 WARN MeasureFieldConverterImpl: pool-22-thread-1 Cant not > convert : to Numeric type value. Value considered as null. > 16/12/22 07:03:12 WARN MeasureFieldConverterImpl: pool-22-thread-2 Cant not > convert : to Numeric type value. Value considered as null. > 16/12/22 07:03:12 WARN MeasureFieldConverterImpl: pool-22-thread-3 Cant not > convert : to Numeric type value. Value considered as null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-564) long time ago, carbon may use dimension table csv file to make dictionary, but now unsed, so remove
[ https://issues.apache.org/jira/browse/CARBONDATA-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-564. - Resolution: Fixed Assignee: Jay Fix Version/s: 1.0.0-incubating > long time ago, carbon may use dimension table csv file to make dictionary, > but now unsed, so remove > > > Key: CARBONDATA-564 > URL: https://issues.apache.org/jira/browse/CARBONDATA-564 > Project: CarbonData > Issue Type: Improvement >Reporter: Jay >Assignee: Jay >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 10m > Remaining Estimate: 0h > > long time ago, carbon may use dimension table csv file to make dictionary, > but now with coldict, allDictionary and so on , there is no need for dimesion > table file to make dictionary, and to make carbondata code easy to read, > these unused part should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-467) CREATE TABLE extension to support bucket table.
[ https://issues.apache.org/jira/browse/CARBONDATA-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-467. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.0.0-incubating > CREATE TABLE extension to support bucket table. > --- > > Key: CARBONDATA-467 > URL: https://issues.apache.org/jira/browse/CARBONDATA-467 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.0.0-incubating > > Time Spent: 4h > Remaining Estimate: 0h > > 1. CREATE TABLE Statement extension. > {code} > CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING) > CLUSTERED BY(user_id) INTO 32 BUCKETS STORED BY 'carbondata'; > {code} > 2. Carbon file format update (Thrift definition extension) > 3. Respect to bucket definition during data load. Store the bucketid to > carbondata index file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-576) Add mvn build guide
[ https://issues.apache.org/jira/browse/CARBONDATA-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-576. - Resolution: Fixed > Add mvn build guide > --- > > Key: CARBONDATA-576 > URL: https://issues.apache.org/jira/browse/CARBONDATA-576 > Project: CarbonData > Issue Type: Improvement >Affects Versions: NONE >Reporter: Liang Chen >Assignee: Liang Chen >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 1.5h > Remaining Estimate: 0h > > Add mvn build guide to github -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-574) Add thrift server support to Spark 2.0 carbon integration
[ https://issues.apache.org/jira/browse/CARBONDATA-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-574. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.0.0-incubating > Add thrift server support to Spark 2.0 carbon integration > - > > Key: CARBONDATA-574 > URL: https://issues.apache.org/jira/browse/CARBONDATA-574 > Project: CarbonData > Issue Type: Bug >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.0.0-incubating > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Add thrift server support to Spark 2.0 carbon integration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-540) Support insertInto without kettle for spark2
[ https://issues.apache.org/jira/browse/CARBONDATA-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-540. - Resolution: Fixed > Support insertInto without kettle for spark2 > > > Key: CARBONDATA-540 > URL: https://issues.apache.org/jira/browse/CARBONDATA-540 > Project: CarbonData > Issue Type: Improvement > Components: data-load >Affects Versions: 1.0.0-incubating >Reporter: QiangCai >Assignee: QiangCai > Fix For: 1.0.0-incubating > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Support inserInto without kettle for spark2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-572) clean up code for carbon-spark-common module
Jacky Li created CARBONDATA-572: --- Summary: clean up code for carbon-spark-common module Key: CARBONDATA-572 URL: https://issues.apache.org/jira/browse/CARBONDATA-572 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-569) clean up code for carbon-processing module
[ https://issues.apache.org/jira/browse/CARBONDATA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-569: Summary: clean up code for carbon-processing module (was: clean up code for carbon-core module ) > clean up code for carbon-processing module > --- > > Key: CARBONDATA-569 > URL: https://issues.apache.org/jira/browse/CARBONDATA-569 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li > Fix For: 1.0.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-571) clean up code for carbon-spark module
Jacky Li created CARBONDATA-571: --- Summary: clean up code for carbon-spark module Key: CARBONDATA-571 URL: https://issues.apache.org/jira/browse/CARBONDATA-571 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-570) clean up code for carbon-hadoop module
Jacky Li created CARBONDATA-570: --- Summary: clean up code for carbon-hadoop module Key: CARBONDATA-570 URL: https://issues.apache.org/jira/browse/CARBONDATA-570 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CARBONDATA-569) clean up code for carbon-core module
[ https://issues.apache.org/jira/browse/CARBONDATA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li reopened CARBONDATA-569: - > clean up code for carbon-core module > - > > Key: CARBONDATA-569 > URL: https://issues.apache.org/jira/browse/CARBONDATA-569 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li > Fix For: 1.0.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (CARBONDATA-569) clean up code for carbon-core module
[ https://issues.apache.org/jira/browse/CARBONDATA-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li closed CARBONDATA-569. --- Resolution: Duplicate > clean up code for carbon-core module > - > > Key: CARBONDATA-569 > URL: https://issues.apache.org/jira/browse/CARBONDATA-569 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li > Fix For: 1.0.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-569) clean up code for carbon-core module
Jacky Li created CARBONDATA-569: --- Summary: clean up code for carbon-core module Key: CARBONDATA-569 URL: https://issues.apache.org/jira/browse/CARBONDATA-569 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-568) clean up code for carbon-core module
Jacky Li created CARBONDATA-568: --- Summary: clean up code for carbon-core module Key: CARBONDATA-568 URL: https://issues.apache.org/jira/browse/CARBONDATA-568 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-566) clean up code for carbon-spark2 module
[ https://issues.apache.org/jira/browse/CARBONDATA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-566: Assignee: Jacky Li > clean up code for carbon-spark2 module > -- > > Key: CARBONDATA-566 > URL: https://issues.apache.org/jira/browse/CARBONDATA-566 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li >Assignee: Jacky Li > Fix For: 1.0.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-566) clean up code for carbon-spark2 module
Jacky Li created CARBONDATA-566: --- Summary: clean up code for carbon-spark2 module Key: CARBONDATA-566 URL: https://issues.apache.org/jira/browse/CARBONDATA-566 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-565) Clean up code suggested by IDE analyzer
[ https://issues.apache.org/jira/browse/CARBONDATA-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-565: Summary: Clean up code suggested by IDE analyzer (was: Clean up code ) > Clean up code suggested by IDE analyzer > --- > > Key: CARBONDATA-565 > URL: https://issues.apache.org/jira/browse/CARBONDATA-565 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li > Fix For: 1.0.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-565) Clean up code
Jacky Li created CARBONDATA-565: --- Summary: Clean up code Key: CARBONDATA-565 URL: https://issues.apache.org/jira/browse/CARBONDATA-565 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Fix For: 1.0.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-547) Add CarbonSession and enabled parser to use all carbon commands
[ https://issues.apache.org/jira/browse/CARBONDATA-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-547. - Resolution: Fixed Fix Version/s: 1.0.0-incubating > Add CarbonSession and enabled parser to use all carbon commands > --- > > Key: CARBONDATA-547 > URL: https://issues.apache.org/jira/browse/CARBONDATA-547 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.0.0-incubating > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently no DDL commands like CREATE,LOAD,ALTER,DROP,DESCRIBE, SHOW LOADS, > DELETE SEGMENTS etc are not working in Spark 2.0 integration. > So please add CarbonSession and overwrite the SQL parser to make all this > commands work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-560) In QueryExecutionException, can not use executorService.shutdownNow() to shut down immediately.
[ https://issues.apache.org/jira/browse/CARBONDATA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-560. - Resolution: Fixed > In QueryExecutionException, can not use executorService.shutdownNow() to shut > down immediately. > --- > > Key: CARBONDATA-560 > URL: https://issues.apache.org/jira/browse/CARBONDATA-560 > Project: CarbonData > Issue Type: Bug >Reporter: Liang Chen >Assignee: Liang Chen >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In QueryExecutionException, can not use executorService.shutdownNow() to shut > down immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-563) Select Queries are not working with spark 1.6.2.
[ https://issues.apache.org/jira/browse/CARBONDATA-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-563. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.0.0-incubating > Select Queries are not working with spark 1.6.2. > - > > Key: CARBONDATA-563 > URL: https://issues.apache.org/jira/browse/CARBONDATA-563 > Project: CarbonData > Issue Type: Bug > Components: core, data-query >Affects Versions: 0.2.0-incubating >Reporter: Babulal >Assignee: Ravindra Pesala > Fix For: 1.0.0-incubating > > Attachments: issue_snapshot.jpg > > Time Spent: 20m > Remaining Estimate: 0h > > Create carbon table > create table x (a int ,b string) stored by 'carbondata' > Load data to carbon table > run query select count(*) from x; > Java.lang.ClassCastException:[Ljava.lang.Object;can not be cast to > org.apache.sql.catalyst.InternalRow > Log snap shot in attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (CARBONDATA-537) Bug fix for DICTIONARY_EXCLUDE option in spark2 integration
[ https://issues.apache.org/jira/browse/CARBONDATA-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li closed CARBONDATA-537. --- Resolution: Won't Fix > Bug fix for DICTIONARY_EXCLUDE option in spark2 integration > --- > > Key: CARBONDATA-537 > URL: https://issues.apache.org/jira/browse/CARBONDATA-537 > Project: CarbonData > Issue Type: Bug >Reporter: Jacky Li > Fix For: 1.0.0-incubating > > Time Spent: 50m > Remaining Estimate: 0h > > 1. Fix bug for dictionary_exclude option in spark2 integration. In spark2, > datat type name is changed from "string" to "stringtype", but > `isStringAndTimestampColDictionaryExclude` is not modified. > 2. Fix bug for data loading with no-kettle. In no-kettle loading, should not > ask user to set kettle home environment variable. > 3. clean up scala code style in `GlobalDictionaryUtil` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-412) in windows, when load into table whose name has "_", the old segment will be deleted.
[ https://issues.apache.org/jira/browse/CARBONDATA-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-412. - Resolution: Fixed Assignee: Jay Fix Version/s: 1.0.0-incubating > in windows, when load into table whose name has "_", the old segment will be > deleted. > - > > Key: CARBONDATA-412 > URL: https://issues.apache.org/jira/browse/CARBONDATA-412 > Project: CarbonData > Issue Type: Bug >Reporter: Jay >Assignee: Jay >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 2h > Remaining Estimate: 0h > > when carbon table name has "_", such as "load_test", then load into table > twice, in the second load, the first segment 0 will be deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-546) Extract data management command to carbon-spark-common module
Jacky Li created CARBONDATA-546: --- Summary: Extract data management command to carbon-spark-common module Key: CARBONDATA-546 URL: https://issues.apache.org/jira/browse/CARBONDATA-546 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Assignee: Jacky Li Fix For: 1.0.0-incubating Currently there are duplicated code for data management command in carbon-spark and carbon-spark2 module. In this PR, following commands are removed from carbonTableSchema.scala and extracted to carbon-spark-common: - ShowLoads - DeleteLoadById - DeleteLoadByDate - CleanFiles -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-519) Enable vector reader in Carbon-Spark 2.0 integration and Carbon layer
[ https://issues.apache.org/jira/browse/CARBONDATA-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-519. - Resolution: Fixed Fix Version/s: 1.0.0-incubating > Enable vector reader in Carbon-Spark 2.0 integration and Carbon layer > - > > Key: CARBONDATA-519 > URL: https://issues.apache.org/jira/browse/CARBONDATA-519 > Project: CarbonData > Issue Type: New Feature >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.0.0-incubating > > Time Spent: 6h > Remaining Estimate: 0h > > Spark 2.0 supports vectorized reader and uses whole codegen to improve > performance, Carbon will enable vectorized reader integrating with Spark to > take advantage of new features of Spark2.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-539) Return empty row in map reduce application
Jacky Li created CARBONDATA-539: --- Summary: Return empty row in map reduce application Key: CARBONDATA-539 URL: https://issues.apache.org/jira/browse/CARBONDATA-539 Project: CarbonData Issue Type: Bug Reporter: Jacky Li Assignee: Jacky Li Fix For: 1.0.0-incubating There is a bug that Carbon will return empty row in map reduce app if projection columns are not set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-516) [SPARK2]update union class in CarbonLateDecoderRule for Spark 2.x integration
[ https://issues.apache.org/jira/browse/CARBONDATA-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-516. - Resolution: Fixed Fix Version/s: 1.0.0-incubating > [SPARK2]update union class in CarbonLateDecoderRule for Spark 2.x integration > - > > Key: CARBONDATA-516 > URL: https://issues.apache.org/jira/browse/CARBONDATA-516 > Project: CarbonData > Issue Type: New Feature >Reporter: QiangCai >Assignee: QiangCai > Fix For: 1.0.0-incubating > > Time Spent: 3h > Remaining Estimate: 0h > > In spark2, Union class is no longer sub-class of BinaryNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-538) Add test case to spark2 integration
Jacky Li created CARBONDATA-538: --- Summary: Add test case to spark2 integration Key: CARBONDATA-538 URL: https://issues.apache.org/jira/browse/CARBONDATA-538 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li Fix For: 1.0.0-incubating Currently spark2 integration has very few test case, it should be improved -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-536) Initialize GlobalDictionaryUtil.updateTableMetadataFunc for Spark 2.x
[ https://issues.apache.org/jira/browse/CARBONDATA-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-536. - > Initialize GlobalDictionaryUtil.updateTableMetadataFunc for Spark 2.x > - > > Key: CARBONDATA-536 > URL: https://issues.apache.org/jira/browse/CARBONDATA-536 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.0.0-incubating >Reporter: QiangCai >Assignee: QiangCai > Fix For: 1.0.0-incubating > > Time Spent: 40m > Remaining Estimate: 0h > > GlobalDictionaryUtil.updateTableMetadataFunc needs to be initialized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-537) Bug fix for DICTIONARY_EXCLUDE option in spark2 integration
Jacky Li created CARBONDATA-537: --- Summary: Bug fix for DICTIONARY_EXCLUDE option in spark2 integration Key: CARBONDATA-537 URL: https://issues.apache.org/jira/browse/CARBONDATA-537 Project: CarbonData Issue Type: Bug Reporter: Jacky Li Fix For: 1.0.0-incubating 1. Fix bug for dictionary_exclude option in spark2 integration. In spark2, datat type name is changed from "string" to "stringtype", but `isStringAndTimestampColDictionaryExclude` is not modified. 2. Fix bug for data loading with no-kettle. In no-kettle loading, should not ask user to set kettle home environment variable. 3. clean up scala code style in `GlobalDictionaryUtil` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-535) carbondata should support datatype: Date and Char
[ https://issues.apache.org/jira/browse/CARBONDATA-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-535. - Resolution: Fixed > carbondata should support datatype: Date and Char > - > > Key: CARBONDATA-535 > URL: https://issues.apache.org/jira/browse/CARBONDATA-535 > Project: CarbonData > Issue Type: Improvement > Components: file-format >Affects Versions: 1.0.0-incubating >Reporter: QiangCai >Assignee: QiangCai > Fix For: 1.0.0-incubating > > > carbondata should support datatype: Date and Char -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-531) Remove spark dependency in carbon core
Jacky Li created CARBONDATA-531: --- Summary: Remove spark dependency in carbon core Key: CARBONDATA-531 URL: https://issues.apache.org/jira/browse/CARBONDATA-531 Project: CarbonData Issue Type: Improvement Affects Versions: 0.2.0-incubating Reporter: Jacky Li Fix For: 1.0.0-incubating Carbon-core module should not depend on spark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-470) Add unsafe offheap and on-heap sort in carbodata loading
[ https://issues.apache.org/jira/browse/CARBONDATA-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-470. - Resolution: Fixed Assignee: Ravindra Pesala Fix Version/s: 1.0.0-incubating > Add unsafe offheap and on-heap sort in carbodata loading > > > Key: CARBONDATA-470 > URL: https://issues.apache.org/jira/browse/CARBONDATA-470 > Project: CarbonData > Issue Type: Improvement >Reporter: Ravindra Pesala >Assignee: Ravindra Pesala > Fix For: 1.0.0-incubating > > Time Spent: 2h 50m > Remaining Estimate: 0h > > In the current carbondata system loading performance is not so encouraging > since we need to sort the data at executor level for data loading. Carbondata > collects batch of data and sorts before dumping to the temporary files and > finally it does merge sort from those temporary files to finish sorting. Here > we face two major issues , one is disk IO and second is GC issue. Even though > we dump to the file still carbondata face lot of GC issue since we sort batch > data in-memory before dumping to the temporary files. > To solve the above problems we can introduce Unsafe Storage and Unsafe sort. > Unsafe Storage : User can configure the memory limit to keep the amount of > data to in-memory. Here we can keep all the data in continuous memory > location either on off-heap or on-heap using Unsafe. Once configure limit > exceeds remaining data will be spilled to disk. > Unsafe Sort : The data which is store in-memory using Unsafe can be sorted > using Unsafe sort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (CARBONDATA-331) Support no compression option while loading
[ https://issues.apache.org/jira/browse/CARBONDATA-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li closed CARBONDATA-331. --- Resolution: Won't Fix > Support no compression option while loading > --- > > Key: CARBONDATA-331 > URL: https://issues.apache.org/jira/browse/CARBONDATA-331 > Project: CarbonData > Issue Type: New Feature >Reporter: Jacky Li > Time Spent: 0.5h > Remaining Estimate: 0h > > Modify the compressor inteface and add a DummyCompressor for not doing > compression. > This interface can be extend later for adding new compressors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CARBONDATA-431) Analysis compression for numeric datatype compared with Parquet/ORC
[ https://issues.apache.org/jira/browse/CARBONDATA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-431: Fix Version/s: 1.0.0-incubating > Analysis compression for numeric datatype compared with Parquet/ORC > --- > > Key: CARBONDATA-431 > URL: https://issues.apache.org/jira/browse/CARBONDATA-431 > Project: CarbonData > Issue Type: Sub-task >Reporter: suo tong >Assignee: Ashok Kumar > Fix For: 1.0.0-incubating > > Time Spent: 2h 50m > Remaining Estimate: 0h > > For the data type, carbon's string type has better compression ratio, but for > numeric type, orc has the best compression. we should analysis numeric > datatype for carbon to get better compression ratio > DataType TextParquet Orc Carbon > decimal 16G |11G | 6G|13G > int 5G | 1G |1G |3G > String 24G |22G |11G |3G (no > dictionary) ---high cardinality > String30G|4G |4G |1G -- > Dictionary encode1G -- Dictionary encode without inverted index > 3G -- No dictionary encode ---low cardinality -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-431) Analysis compression for numeric datatype compared with Parquet/ORC
[ https://issues.apache.org/jira/browse/CARBONDATA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-431. - Resolution: Fixed Assignee: Ashok Kumar (was: Raghunandan S) > Analysis compression for numeric datatype compared with Parquet/ORC > --- > > Key: CARBONDATA-431 > URL: https://issues.apache.org/jira/browse/CARBONDATA-431 > Project: CarbonData > Issue Type: Sub-task >Reporter: suo tong >Assignee: Ashok Kumar > Time Spent: 2h 50m > Remaining Estimate: 0h > > For the data type, carbon's string type has better compression ratio, but for > numeric type, orc has the best compression. we should analysis numeric > datatype for carbon to get better compression ratio > DataType TextParquet Orc Carbon > decimal 16G |11G | 6G|13G > int 5G | 1G |1G |3G > String 24G |22G |11G |3G (no > dictionary) ---high cardinality > String30G|4G |4G |1G -- > Dictionary encode1G -- Dictionary encode without inverted index > 3G -- No dictionary encode ---low cardinality -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-528) to support octal escape delimiter char
[ https://issues.apache.org/jira/browse/CARBONDATA-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-528. - Resolution: Fixed Assignee: zhaowei > to support octal escape delimiter char > --- > > Key: CARBONDATA-528 > URL: https://issues.apache.org/jira/browse/CARBONDATA-528 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 0.2.0-incubating >Reporter: zhaowei >Assignee: zhaowei >Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-521) Depends on more stable class of spark in spark2
[ https://issues.apache.org/jira/browse/CARBONDATA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-521. - Resolution: Fixed Assignee: Fei Wang > Depends on more stable class of spark in spark2 > --- > > Key: CARBONDATA-521 > URL: https://issues.apache.org/jira/browse/CARBONDATA-521 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Fei Wang >Assignee: Fei Wang > Fix For: 1.0.0-incubating > > Time Spent: 20m > Remaining Estimate: 0h > > avoid to use unstable class in spark2, otherwise it leads to compatible issue > with spark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-520) Executor can not get the read support class
[ https://issues.apache.org/jira/browse/CARBONDATA-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-520. - Resolution: Fixed > Executor can not get the read support class > > > Key: CARBONDATA-520 > URL: https://issues.apache.org/jira/browse/CARBONDATA-520 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Fei Wang >Assignee: Fei Wang > Fix For: 1.0.0-incubating > > Time Spent: 0.5h > Remaining Estimate: 0h > > Executor can not get the read support class, this leads to cast exception > when running carbon on spark2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CARBONDATA-517) Use carbon property to get the store path/kettle home
[ https://issues.apache.org/jira/browse/CARBONDATA-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-517. - Resolution: Fixed > Use carbon property to get the store path/kettle home > - > > Key: CARBONDATA-517 > URL: https://issues.apache.org/jira/browse/CARBONDATA-517 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Affects Versions: 0.2.0-incubating >Reporter: Fei Wang >Assignee: Fei Wang > Fix For: 1.0.0-incubating > > Time Spent: 0.5h > Remaining Estimate: 0h > > to distinguish the carbon config with spark config. for carbon config we use > carbon property to get them -- This message was sent by Atlassian JIRA (v6.3.4#6332)