[jira] [Commented] (PHOENIX-2784) phoenix-spark: Allow coercion of DATE fields to TIMESTAMP when loading DataFrames
[ https://issues.apache.org/jira/browse/PHOENIX-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256709#comment-15256709 ] maghamravikiran commented on PHOENIX-2784: -- [~jmahonin] The patch looks good. +1 > phoenix-spark: Allow coercion of DATE fields to TIMESTAMP when loading > DataFrames > - > > Key: PHOENIX-2784 > URL: https://issues.apache.org/jira/browse/PHOENIX-2784 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.7.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin >Priority: Minor > Attachments: PHOENIX-2784.patch > > > The Phoenix DATE type is internally represented as an 8 bytes, which can > store a full '-MM-dd hh:mm:ss' time component. However, Spark SQL follows > the SQL Date spec and keeps only the '-MM-dd' portion as a 4 byte type. > When loading Phoenix DATE columns using the Spark DataFrame API, the > 'hh:mm:ss' component is lost. > This patch allows setting a new 'dateAsTimestamp' option when loading a > DataFrame, which will coerce the underlying Date object to a Timestamp so > that the full time component is loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2810) Fixing IndexTool Dependencies
[ https://issues.apache.org/jira/browse/PHOENIX-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230219#comment-15230219 ] maghamravikiran commented on PHOENIX-2810: -- Valid [~gabriel.reid] . The patch applies neatly only on the master and breaks on 4.x - HBase-1.0 and 4.x-HBase-0.98. Holding off merging this patch . > Fixing IndexTool Dependencies > - > > Key: PHOENIX-2810 > URL: https://issues.apache.org/jira/browse/PHOENIX-2810 > Project: Phoenix > Issue Type: Bug >Reporter: churro morales >Priority: Minor > Labels: HBASEDEPENDENCIES > Attachments: PHOENIX-2810.patch > > > IndexTool uses HFileOutputFormat which is deprecated. Use HFileOutputFormat2 > instead and fix other private dependencies for this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2786) Can MultiTableOutputFormat be used instead of MultiHfileOutputFormat
[ https://issues.apache.org/jira/browse/PHOENIX-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204754#comment-15204754 ] maghamravikiran commented on PHOENIX-2786: -- [~churromorales] From what I see, MultiTableOutputFormat uses the Put / Delete mutation rather than writing to HFiles that MultiHfileOutputFormat does. We definitely have seen times , for ex: for a newly created table , doing direct writes to HBase perform way better than bulk load but in general writing to HFiles performs better. I definitely agree to your valid point that the code in MultiHfileOutputFormat has a lot from HfileOutputFormat except for few minor changes. > Can MultiTableOutputFormat be used instead of MultiHfileOutputFormat > > > Key: PHOENIX-2786 > URL: https://issues.apache.org/jira/browse/PHOENIX-2786 > Project: Phoenix > Issue Type: Task >Reporter: churro morales > > MultiHfileOutputFormat depends on a lot of HBase classes that it shouldn't > depend on. It seems like MultiHfileOutputFormat and MultiTableOutputFormat > have the same goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran reassigned PHOENIX-418: --- Assignee: maghamravikiran > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: maghamravikiran > Labels: gsoc2016 > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170645#comment-15170645 ] maghamravikiran edited comment on PHOENIX-2649 at 2/27/16 4:29 PM: --- To me it looks like the issue is in this code snippet in [#1] where the mapper output key of TableRowkeyPair includes a table index and rowkey rather than table name and rowkey. While creating the partitioner path [#2] during the job setup , we apparently use TableRowkeyPair which is a combination of table name and rowkey of the table. This mismatch seems to be the root cause of the issue and the TotalOrderPartitioner is distributing all mapper output to a single reducer 1. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274 2. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707 The initial code drop of PHOENIX-2216 didn't introduce this issue. was (Author: maghamraviki...@gmail.com): To me it looks like the issue is in this code snippet in [#1] where the mapper output key of TableRowkeyPair includes a table index and rowkey rather than table name and rowkey. While creating the partitioner path [#2] during the job setup , we apparently use TableRowkeyPair which is a combination of table name and rowkey of the table. This mismatch seems to be the root cause of the issue and the TotalOrderPartitioner is distributing all mapper output to a single reducer 1. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274 2. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707 > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 4.7.0 > > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, > PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170645#comment-15170645 ] maghamravikiran commented on PHOENIX-2649: -- To me it looks like the issue is in this code snippet in [#1] where the mapper output key of TableRowkeyPair includes a table index and rowkey rather than table name and rowkey. While creating the partitioner path [#2] during the job setup , we apparently use TableRowkeyPair which is a combination of table name and rowkey of the table. This mismatch seems to be the root cause of the issue and the TotalOrderPartitioner is distributing all mapper output to a single reducer 1. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274 2. https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707 > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 4.7.0 > > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, > PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2674) PhoenixMapReduceUtil#setInput doesn't honor condition clause
[ https://issues.apache.org/jira/browse/PHOENIX-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143972#comment-15143972 ] maghamravikiran commented on PHOENIX-2674: -- Good catch [~jesse_yates] on the missing usage of condition clause. +1 for the changes. > PhoenixMapReduceUtil#setInput doesn't honor condition clause > > > Key: PHOENIX-2674 > URL: https://issues.apache.org/jira/browse/PHOENIX-2674 > Project: Phoenix > Issue Type: Bug >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: PHOENIX-2674.patch, phoenix-2674-v0-without-test.patch > > > The parameter is completely unused in the method. Further, it looks like we > don't actually test this method or any m/r tools directly. > It would be good to (a) have explicit tests for the MapReduce code - rather > than relying on indirect tests like the index util - and, (b) have an example > in code for using the mapreduce tools, rather than just the web docs (which > can become out of date). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133205#comment-15133205 ] maghamravikiran commented on PHOENIX-2649: -- Pushed the patch. Closing the ticket > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 4.7.0 > > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, > PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2649: - Assignee: Sergey Soldatov (was: maghamravikiran) > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 4.7.0 > > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, > PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-2649. -- Resolution: Fixed > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 4.7.0 > > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, > PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132702#comment-15132702 ] maghamravikiran commented on PHOENIX-2649: -- +1 for the changes. Definitely the bug was in the usage of vInt at the first place. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: maghamravikiran >Priority: Critical > Fix For: 4.7.0 > > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, > PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133004#comment-15133004 ] maghamravikiran commented on PHOENIX-2649: -- Thanks [~sergey.soldatov] for the contribution. One minor nit : The static * import . One of us during checkin will address it. {code} import static org.apache.hadoop.hbase.util.Bytes.*; {code} [~gabriel.reid], [~giacomotaylor] Can I have a go ahead from one of you before the patch is pushed. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: maghamravikiran >Priority: Critical > Fix For: 4.7.0 > > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, > PHOENIX-2649-3.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131130#comment-15131130 ] maghamravikiran edited comment on PHOENIX-2649 at 2/3/16 9:06 PM: -- Thanks [~gabriel.reid] [~sergey.soldatov] . I updated the latest patch. Can you please review. was (Author: maghamraviki...@gmail.com): Uses BytesWritable.Comparator as the comparator. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Priority: Critical > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-2649. -- Resolution: Fixed > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: maghamravikiran >Priority: Critical > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131259#comment-15131259 ] maghamravikiran commented on PHOENIX-2649: -- I pushed the patch to 4.x and master branch. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Priority: Critical > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2649: - Attachment: PHOENIX-2649-1.patch Uses BytesWritable.Comparator as the comparator. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Priority: Critical > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran reassigned PHOENIX-2649: Assignee: maghamravikiran > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Assignee: maghamravikiran >Priority: Critical > Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130816#comment-15130816 ] maghamravikiran commented on PHOENIX-2649: -- [~sergey.soldatov] Are you suggesting we have just the following line or we restrict just to compareTo method in the class. I am not aware of the default Writable comparator , hence, pardon me. {code} static { WritableComparator.define(TableRowkeyPair.class, new BytesWritable.Comparator()); } {code} In addition, I notice the absence of hashcode() implementation which I will add :(. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Priority: Critical > Attachments: PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2649: - Attachment: PHOENIX-2649.patch > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Priority: Critical > Attachments: PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129720#comment-15129720 ] maghamravikiran commented on PHOENIX-2649: -- [~giacomotaylor], No. this is not regression. The tests in the patch are failing without the fix. Can you please review. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Priority: Critical > Attachments: PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129742#comment-15129742 ] maghamravikiran commented on PHOENIX-2649: -- Thanks [~sergey.soldatov] for identifying the issue. Can you please try applying the patch to see if the issue you are reporting is fixed. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Priority: Critical > Attachments: PHOENIX-2649.patch > > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad
[ https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129679#comment-15129679 ] maghamravikiran commented on PHOENIX-2649: -- I am working on a patch for this. Definitely , the comparator isn't working as expected. > GC/OOM during BulkLoad > -- > > Key: PHOENIX-2649 > URL: https://issues.apache.org/jira/browse/PHOENIX-2649 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2 >Reporter: Sergey Soldatov >Priority: Critical > > Phoenix fails to complete bulk load of 40Mb csv data with GC heap error > during Reduce phase. The problem is in the comparator for TableRowkeyPair. It > expects that the serialized value was written using zero-compressed encoding, > but at least in my case it was written in regular way. So, trying to obtain > length for table name and row key it always get zero and reports that those > byte arrays are equal. As the result, the reducer receives all data produced > by mappers in one reduce call and fails with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1849) MemoryLeak in PhoenixFlumePlugin PhoenixConnection
[ https://issues.apache.org/jira/browse/PHOENIX-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122470#comment-15122470 ] maghamravikiran commented on PHOENIX-1849: -- Sure [~jamestaylor] . I was waiting on the Jenkins results of the tests to finish. Thanks for closing it. > MemoryLeak in PhoenixFlumePlugin PhoenixConnection > -- > > Key: PHOENIX-1849 > URL: https://issues.apache.org/jira/browse/PHOENIX-1849 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.3.0 > Environment: JDK 1.7 Flume 1.5.2 Phoenix 4.3.0 HBase 0.98 >Reporter: PeiLiping >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-1849.patch > > > I got OOME after using the PhoenixFlumePlugin to write data into HBase for 6 > hours. It looks like the PhoenixConnection never release the prepare > statements list even I call the commit method manually. > Now I had to close the connection after using the connection thousand times > and recreate a new connection later. > This issue is caused by the statements is never be cleared, so the fix could > be clear the statements once the connection doesn't need them. > Code: > PhoenixConnection.java : 122 > private List statements = new ArrayList(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2542) CSV bulk loading with --schema option is broken
[ https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2542: - Fix Version/s: 4.7.0 > CSV bulk loading with --schema option is broken > --- > > Key: PHOENIX-2542 > URL: https://issues.apache.org/jira/browse/PHOENIX-2542 > Project: Phoenix > Issue Type: Bug > Environment: Current master branch / HBase 1.1.2 >Reporter: YoungWoo Kim >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2542.patch > > > My bulk load command looks like this: > {code} > HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop > jar /usr/lib/phoenix/phoenix-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} > --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d > $'\001' > {code} > Got errors as following: > {noformat} > 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : > attempt_1450018293185_0952_m_04_2, Status : FAILED > Error: java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44) > at > org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147) > ... 9 more > Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 > (42M03): Table undefined. tableName=MYTABLE > at > org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436) > at > org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285) > at > org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249) > at > org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324) > at > org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84) > ... 12 more > {noformat} > My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my > schema name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2542) CSV bulk loading with --schema option is broken
[ https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121945#comment-15121945 ] maghamravikiran commented on PHOENIX-2542: -- Closing this. > CSV bulk loading with --schema option is broken > --- > > Key: PHOENIX-2542 > URL: https://issues.apache.org/jira/browse/PHOENIX-2542 > Project: Phoenix > Issue Type: Bug > Environment: Current master branch / HBase 1.1.2 >Reporter: YoungWoo Kim >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2542.patch > > > My bulk load command looks like this: > {code} > HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop > jar /usr/lib/phoenix/phoenix-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} > --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d > $'\001' > {code} > Got errors as following: > {noformat} > 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : > attempt_1450018293185_0952_m_04_2, Status : FAILED > Error: java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44) > at > org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147) > ... 9 more > Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 > (42M03): Table undefined. tableName=MYTABLE > at > org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436) > at > org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285) > at > org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249) > at > org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324) > at > org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84) > ... 12 more > {noformat} > My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my > schema name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-2542) CSV bulk loading with --schema option is broken
[ https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-2542. -- Resolution: Fixed > CSV bulk loading with --schema option is broken > --- > > Key: PHOENIX-2542 > URL: https://issues.apache.org/jira/browse/PHOENIX-2542 > Project: Phoenix > Issue Type: Bug > Environment: Current master branch / HBase 1.1.2 >Reporter: YoungWoo Kim >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2542.patch > > > My bulk load command looks like this: > {code} > HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop > jar /usr/lib/phoenix/phoenix-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} > --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d > $'\001' > {code} > Got errors as following: > {noformat} > 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : > attempt_1450018293185_0952_m_04_2, Status : FAILED > Error: java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44) > at > org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147) > ... 9 more > Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 > (42M03): Table undefined. tableName=MYTABLE > at > org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436) > at > org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285) > at > org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249) > at > org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324) > at > org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84) > ... 12 more > {noformat} > My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my > schema name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2542) CSV bulk loading with --schema option is broken
[ https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2542: - Attachment: PHOENIX-2542.patch [~jamestaylor], [~gabriel.reid] Can you please review the patch. > CSV bulk loading with --schema option is broken > --- > > Key: PHOENIX-2542 > URL: https://issues.apache.org/jira/browse/PHOENIX-2542 > Project: Phoenix > Issue Type: Bug > Environment: Current master branch / HBase 1.1.2 >Reporter: YoungWoo Kim >Assignee: maghamravikiran > Attachments: PHOENIX-2542.patch > > > My bulk load command looks like this: > {code} > HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop > jar /usr/lib/phoenix/phoenix-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} > --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d > $'\001' > {code} > Got errors as following: > {noformat} > 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : > attempt_1450018293185_0952_m_04_2, Status : FAILED > Error: java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44) > at > org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147) > ... 9 more > Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 > (42M03): Table undefined. tableName=MYTABLE > at > org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436) > at > org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285) > at > org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249) > at > org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324) > at > org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84) > ... 12 more > {noformat} > My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my > schema name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-2542) CSV bulk loading with --schema option is broken
[ https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran reassigned PHOENIX-2542: Assignee: maghamravikiran > CSV bulk loading with --schema option is broken > --- > > Key: PHOENIX-2542 > URL: https://issues.apache.org/jira/browse/PHOENIX-2542 > Project: Phoenix > Issue Type: Bug > Environment: Current master branch / HBase 1.1.2 >Reporter: YoungWoo Kim >Assignee: maghamravikiran > > My bulk load command looks like this: > {code} > HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop > jar /usr/lib/phoenix/phoenix-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} > --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d > $'\001' > {code} > Got errors as following: > {noformat} > 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : > attempt_1450018293185_0952_m_04_2, Status : FAILED > Error: java.lang.RuntimeException: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.RuntimeException: > org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table > undefined. tableName=MYTABLE > at com.google.common.base.Throwables.propagate(Throwables.java:156) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44) > at > org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133) > at > org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147) > ... 9 more > Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 > (42M03): Table undefined. tableName=MYTABLE > at > org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436) > at > org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285) > at > org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249) > at > org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324) > at > org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177) > at > org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84) > ... 12 more > {noformat} > My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my > schema name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1849) MemoryLeak in PhoenixFlumePlugin PhoenixConnection
[ https://issues.apache.org/jira/browse/PHOENIX-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-1849: - Attachment: PHOENIX-1849.patch [~jamestaylor] Can you please review. > MemoryLeak in PhoenixFlumePlugin PhoenixConnection > -- > > Key: PHOENIX-1849 > URL: https://issues.apache.org/jira/browse/PHOENIX-1849 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.3.0 > Environment: JDK 1.7 Flume 1.5.2 Phoenix 4.3.0 HBase 0.98 >Reporter: PeiLiping >Assignee: maghamravikiran > Fix For: 4.8.0 > > Attachments: PHOENIX-1849.patch > > > I got OOME after using the PhoenixFlumePlugin to write data into HBase for 6 > hours. It looks like the PhoenixConnection never release the prepare > statements list even I call the commit method manually. > Now I had to close the connection after using the connection thousand times > and recreate a new connection later. > This issue is caused by the statements is never be cleared, so the fix could > be clear the statements once the connection doesn't need them. > Code: > PhoenixConnection.java : 122 > private List statements = new ArrayList(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1811) Provide Java Wrappers to the Scala api in phoenix-spark module
[ https://issues.apache.org/jira/browse/PHOENIX-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114531#comment-15114531 ] maghamravikiran commented on PHOENIX-1811: -- [~giacomotaylor] Initially, I wasn't able to use the Scala api from a java program. I will give it a try again and if the Java wrappers aren't necessary, I will close this ticket. > Provide Java Wrappers to the Scala api in phoenix-spark module > -- > > Key: PHOENIX-1811 > URL: https://issues.apache.org/jira/browse/PHOENIX-1811 > Project: Phoenix > Issue Type: New Feature >Reporter: maghamravikiran >Assignee: maghamravikiran > > Create a Java wrapper around the Scala api that has been written as part of > phoenix-spark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1849) MemoryLeak in PhoenixFlumePlugin PhoenixConnection
[ https://issues.apache.org/jira/browse/PHOENIX-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114128#comment-15114128 ] maghamravikiran commented on PHOENIX-1849: -- [~jamestaylor] I notice we don't close the PreparedStatement at https://github.com/apache/phoenix/blob/master/phoenix-flume/src/main/java/org/apache/phoenix/flume/serializer/RegexEventSerializer.java#L72 . I will work on providing a patch soon. > MemoryLeak in PhoenixFlumePlugin PhoenixConnection > -- > > Key: PHOENIX-1849 > URL: https://issues.apache.org/jira/browse/PHOENIX-1849 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.3.0 > Environment: JDK 1.7 Flume 1.5.2 Phoenix 4.3.0 HBase 0.98 >Reporter: PeiLiping >Assignee: maghamravikiran > Fix For: 4.8.0 > > > I got OOME after using the PhoenixFlumePlugin to write data into HBase for 6 > hours. It looks like the PhoenixConnection never release the prepare > statements list even I call the commit method manually. > Now I had to close the connection after using the connection thousand times > and recreate a new connection later. > This issue is caused by the statements is never be cleared, so the fix could > be clear the statements once the connection doesn't need them. > Code: > PhoenixConnection.java : 122 > private List statements = new ArrayList(); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2584: - Attachment: PHOENIX-2484-3.patch Hopefully the last version of the patch :) [~jamestaylor] , [~prkommireddi] can you please review. > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2484-2.patch, PHOENIX-2484-3.patch, > PHOENIX-2584-1.patch, PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2584: - Attachment: PHOENIX-2484-2.patch [~jamestaylor], [~prkommireddi] I have made two changes to the earlier patch. a) Determine the sql type of the column using ColumnProjector. This helps in cases where a column is defined as VARCHAR type but the user uses a REGEX_SPLIT function on that column in the SQL query passed to LOAD. This causes the resultant data type to be of Array type. b) Added two tests , one that tests Arrays in SQL query and the other where the user specifies just the table name in LOAD statement. Can you please have a review. You will notice the order of imports in PhoenixRecordWritable have changed and they match the order of the settings file phoenix.importorder that we have. > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2484-2.patch, PHOENIX-2584-1.patch, > PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108016#comment-15108016 ] maghamravikiran commented on PHOENIX-2584: -- [~jamestaylor] I am still working on a fix for the issue I mentioned above. > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106883#comment-15106883 ] maghamravikiran edited comment on PHOENIX-2584 at 1/19/16 3:39 PM: --- Thanks [~prkommireddi] for the comments. [~jamestaylor] I didn't push the patch yesterday as I noticed a bug in the code. I am working on fixing it. The test which fail are {code} @Test public void testTimeForSQLQuery() throws Exception { //create the table String ddl = "CREATE TABLE TIME_T (MYKEY VARCHAR,DATE_STP TIME CONSTRAINT PK PRIMARY KEY (MYKEY)) "; conn.createStatement().execute(ddl); final String dml = "UPSERT INTO TIME_T VALUES('foo',TO_TIME('2008-05-16 00:30:00'))"; conn.createStatement().execute(dml); conn.commit(); //sql query final String sqlQuery = " SELECT mykey, minute(DATE_STP) FROM TIME_T "; pigServer.registerQuery(String.format( "A = load 'hbase://query/%s' using org.apache.phoenix.pig.PhoenixHBaseLoader('%s');", sqlQuery, zkQuorum)); final Iterator iterator = pigServer.openIterator("A"); while (iterator.hasNext()) { Tuple tuple = iterator.next(); assertEquals("foo", tuple.get(0)); assertEquals(30, tuple.get(1)); } } {code} Here , we use a Phoenix Function minute() in the SQL Query. The code in PhoenixHbaseLoader makes a call(added in this patch) to fetch the ColumnInfo of each column in the SELECT expression and is failing to determine the data type for column *minute(DATE_STP)* . I added this call to determine the exact data type of a Phoenix Array and use it in constructing a Pig Tuple. {code} PhoenixHBaseLoader.java private void initializePhoenixPigConfiguration(final String location, final Configuration configuration) throws IOException { ... ... ... // newly added call to get a List. this.columnInfoList = PhoenixConfigurationUtil.getSelectColumnMetadataList(this.config); {code} was (Author: maghamraviki...@gmail.com): Thanks [~prkommireddi] for the comments. [~jamestaylor] I didn't push the patch yesterday as I noticed a bug in the code. I am working on fixing it. The test which fail are {code} @Test public void testTimeForSQLQuery() throws Exception { //create the table String ddl = "CREATE TABLE TIME_T (MYKEY VARCHAR,DATE_STP TIME CONSTRAINT PK PRIMARY KEY (MYKEY)) "; conn.createStatement().execute(ddl); final String dml = "UPSERT INTO TIME_T VALUES('foo',TO_TIME('2008-05-16 00:30:00'))"; conn.createStatement().execute(dml); conn.commit(); //sql query final String sqlQuery = " SELECT mykey, minute(DATE_STP) FROM TIME_T "; pigServer.registerQuery(String.format( "A = load 'hbase://query/%s' using org.apache.phoenix.pig.PhoenixHBaseLoader('%s');", sqlQuery, zkQuorum)); final Iterator iterator = pigServer.openIterator("A"); while (iterator.hasNext()) { Tuple tuple = iterator.next(); assertEquals("foo", tuple.get(0)); assertEquals(30, tuple.get(1)); } } {code} Here , we use a Phoenix Function minute() in the SQL Query. The code in PhoenixHbaseLoader makes a call(added in this patch) to fetch the ColumnInfo of each column in the SELECT expression and is failing to determine the data type for column *minute(DATE_STP)* . I added this call to determine the exact data type of a Phoenix Array. {code} PhoenixHBaseLoader.java private void initializePhoenixPigConfiguration(final String location, final Configuration configuration) throws IOException { ... ... ... // newly added call to get a List. this.columnInfoList = PhoenixConfigurationUtil.getSelectColumnMetadataList(this.config); {code} > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106883#comment-15106883 ] maghamravikiran commented on PHOENIX-2584: -- Thanks [~prkommireddi] for the comments. [~jamestaylor] I didn't push the patch yesterday as I noticed a bug in the code. I am working on fixing it. The test which fail are {code} @Test public void testTimeForSQLQuery() throws Exception { //create the table String ddl = "CREATE TABLE TIME_T (MYKEY VARCHAR,DATE_STP TIME CONSTRAINT PK PRIMARY KEY (MYKEY)) "; conn.createStatement().execute(ddl); final String dml = "UPSERT INTO TIME_T VALUES('foo',TO_TIME('2008-05-16 00:30:00'))"; conn.createStatement().execute(dml); conn.commit(); //sql query final String sqlQuery = " SELECT mykey, minute(DATE_STP) FROM TIME_T "; pigServer.registerQuery(String.format( "A = load 'hbase://query/%s' using org.apache.phoenix.pig.PhoenixHBaseLoader('%s');", sqlQuery, zkQuorum)); final Iterator iterator = pigServer.openIterator("A"); while (iterator.hasNext()) { Tuple tuple = iterator.next(); assertEquals("foo", tuple.get(0)); assertEquals(30, tuple.get(1)); } } {code} Here , we use a Phoenix Function minute() in the SQL Query. The code in PhoenixHbaseLoader makes a call(added in this patch) to fetch the ColumnInfo of each column in the SELECT expression and is failing to determine the data type for column *minute(DATE_STP)* . I added this call to determine the exact data type of a Phoenix Array. {code} PhoenixHBaseLoader.java private void initializePhoenixPigConfiguration(final String location, final Configuration configuration) throws IOException { ... ... ... // newly added call to get a List. this.columnInfoList = PhoenixConfigurationUtil.getSelectColumnMetadataList(this.config); {code} > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2584: - Attachment: PHOENIX-2584-1.patch Thanks [~prkommireddi] for the review. I have an updated patch addressing your comments. > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105432#comment-15105432 ] maghamravikiran commented on PHOENIX-2584: -- {code} "toColumnNameMap" creates a Map with name as the key. How is this used? Could not figure from the code {code} To determine the exact data type of the underlying array of the phoenix object, we construct a map of column name and its ColumnInfo and is passed on to construct Tuple from the phoenix array. {code} private static Tuple newTuple(final ColumnInfo cinfo,Object object) throws ExecException {code} > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103814#comment-15103814 ] maghamravikiran commented on PHOENIX-2584: -- Thanks [~jmahonin] for the heads up. I replaced the PhoenixPigDBWritable with PhoenixRecordWritable in the attached patch. > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Attachments: PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104017#comment-15104017 ] maghamravikiran commented on PHOENIX-2584: -- Yes [~prkommireddi]. > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2584) Support Array datatype in phoenix-pig module
[ https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2584: - Attachment: PHOENIX-2584.patch First code drop to support Array type in phoenix-pig module. [~jamestaylor] , Can you please review. > Support Array datatype in phoenix-pig module > > > Key: PHOENIX-2584 > URL: https://issues.apache.org/jira/browse/PHOENIX-2584 > Project: Phoenix > Issue Type: Bug >Reporter: maghamravikiran >Assignee: maghamravikiran > Attachments: PHOENIX-2584.patch > > > The plan is to map an array data type column to a Tuple in Pig. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()
[ https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2433: - Attachment: PHOENIX-2433-2.patch > support additional time units (like week/month/year) in Trunc() round() and > Ceil() > --- > > Key: PHOENIX-2433 > URL: https://issues.apache.org/jira/browse/PHOENIX-2433 > Project: Phoenix > Issue Type: Improvement >Reporter: noam bulvik >Assignee: maghamravikiran > Labels: newbie > Fix For: 4.7.0 > > Attachments: PHOENIX-2433-2.patch, PHOENIX-2433-firstdrop.patch, > PHOENIX-2433.patch > > > currently the time units that are supported in trunk(), round(), ceil are > day/hour/minute/seconds/milliseconds. > It should support also other values like week, month, year > You can see how it is documented for Oracle in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and > different supported level in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()
[ https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2433: - Attachment: (was: PHOENIX-2433-2.patch) > support additional time units (like week/month/year) in Trunc() round() and > Ceil() > --- > > Key: PHOENIX-2433 > URL: https://issues.apache.org/jira/browse/PHOENIX-2433 > Project: Phoenix > Issue Type: Improvement >Reporter: noam bulvik >Assignee: maghamravikiran > Labels: newbie > Fix For: 4.7.0 > > Attachments: PHOENIX-2433-firstdrop.patch, PHOENIX-2433.patch > > > currently the time units that are supported in trunk(), round(), ceil are > day/hour/minute/seconds/milliseconds. > It should support also other values like week, month, year > You can see how it is documented for Oracle in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and > different supported level in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()
[ https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2433: - Attachment: PHOENIX-2433-minor-fix.patch > support additional time units (like week/month/year) in Trunc() round() and > Ceil() > --- > > Key: PHOENIX-2433 > URL: https://issues.apache.org/jira/browse/PHOENIX-2433 > Project: Phoenix > Issue Type: Improvement >Reporter: noam bulvik >Assignee: maghamravikiran > Labels: newbie > Fix For: 4.7.0 > > Attachments: PHOENIX-2433-2.patch, PHOENIX-2433-firstdrop.patch, > PHOENIX-2433-minor-fix.patch, PHOENIX-2433.patch > > > currently the time units that are supported in trunk(), round(), ceil are > day/hour/minute/seconds/milliseconds. > It should support also other values like week, month, year > You can see how it is documented for Oracle in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and > different supported level in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()
[ https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2433: - Attachment: PHOENIX-2433-2.patch [~jamestaylor] I have applied the changes you requested for. Can you please review. > support additional time units (like week/month/year) in Trunc() round() and > Ceil() > --- > > Key: PHOENIX-2433 > URL: https://issues.apache.org/jira/browse/PHOENIX-2433 > Project: Phoenix > Issue Type: Improvement >Reporter: noam bulvik >Assignee: maghamravikiran > Labels: newbie > Fix For: 4.7.0 > > Attachments: PHOENIX-2433-2.patch, PHOENIX-2433-firstdrop.patch, > PHOENIX-2433.patch > > > currently the time units that are supported in trunk(), round(), ceil are > day/hour/minute/seconds/milliseconds. > It should support also other values like week, month, year > You can see how it is documented for Oracle in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and > different supported level in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()
[ https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2433: - Attachment: PHOENIX-2433.patch [~jamestaylor] Can you please review it. > support additional time units (like week/month/year) in Trunc() round() and > Ceil() > --- > > Key: PHOENIX-2433 > URL: https://issues.apache.org/jira/browse/PHOENIX-2433 > Project: Phoenix > Issue Type: Improvement >Reporter: noam bulvik >Assignee: maghamravikiran > Labels: newbie > Attachments: PHOENIX-2433-firstdrop.patch, PHOENIX-2433.patch > > > currently the time units that are supported in trunk(), round(), ceil are > day/hour/minute/seconds/milliseconds. > It should support also other values like week, month, year > You can see how it is documented for Oracle in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and > different supported level in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-2584) Support Array datatype in phoenix-pig module
maghamravikiran created PHOENIX-2584: Summary: Support Array datatype in phoenix-pig module Key: PHOENIX-2584 URL: https://issues.apache.org/jira/browse/PHOENIX-2584 Project: Phoenix Issue Type: Bug Reporter: maghamravikiran Assignee: maghamravikiran The plan is to map an array data type column to a Tuple in Pig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()
[ https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2433: - Attachment: PHOENIX-2433-firstdrop.patch [~James Taylor] , Two approaches come to my mind. 1. Joda Time: I have patch that uses this library providing us with the necessary functionality . Its simple and just works like a charm. 2. Write the logic ourselves. I initially started off with this approach and to find out the result for FLOOR('date','WEEK') , the following code needs to be applied . Definitely this will change with CEIL and ROUND . {code} Date dateUpserted = new Date(); long divBy = 24 * 60 * 60 * 1000; long millis = dateUpserted.getTime(); millis = millis + 3 * divBy; millis = millis / (7 * divBy); millis = millis * 7 * divBy; millis = millis - (3 * divBy); Date flooredDate = new Date(millis); {code} Is it ok with the first approach or should we stick with the second ? I prefer the former as its simpler and handles all corner cases well. All the tests in RoundFloorCeilFunctionsEnd2EndIT pass with the patch attached. > support additional time units (like week/month/year) in Trunc() round() and > Ceil() > --- > > Key: PHOENIX-2433 > URL: https://issues.apache.org/jira/browse/PHOENIX-2433 > Project: Phoenix > Issue Type: Improvement >Reporter: noam bulvik >Assignee: maghamravikiran > Labels: newbie > Attachments: PHOENIX-2433-firstdrop.patch > > > currently the time units that are supported in trunk(), round(), ceil are > day/hour/minute/seconds/milliseconds. > It should support also other values like week, month, year > You can see how it is documented for Oracle in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and > different supported level in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()
[ https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran reassigned PHOENIX-2433: Assignee: maghamravikiran > support additional time units (like week/month/year) in Trunc() round() and > Ceil() > --- > > Key: PHOENIX-2433 > URL: https://issues.apache.org/jira/browse/PHOENIX-2433 > Project: Phoenix > Issue Type: Improvement >Reporter: noam bulvik >Assignee: maghamravikiran > Labels: newbie > > currently the time units that are supported in trunk(), round(), ceil are > day/hour/minute/seconds/milliseconds. > It should support also other values like week, month, year > You can see how it is documented for Oracle in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and > different supported level in > http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails
[ https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074241#comment-15074241 ] maghamravikiran commented on PHOENIX-2538: -- [~gabriel.reid] , [~jamestaylor] I have applied the patch to 4.x.-HBase-0.98 , 4.x.-HBase-1.0 and master branches only. Please let me know if this should be applied to any other branches that I am missing. > CsvBulkLoadTool should return non-zero exit status if import fails > -- > > Key: PHOENIX-2538 > URL: https://issues.apache.org/jira/browse/PHOENIX-2538 > Project: Phoenix > Issue Type: Bug >Reporter: Gabriel Reid >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2538-1.patch, PHOENIX-2538.patch > > > The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it > does not correctly return a non-zero error code if the import job fails. This > makes it impossible for users of the tool to automatically determine if the > tool failed (e.g. when running it from shell scripts). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2540) Same column twice in CREATE TABLE leads unusable state of the table
[ https://issues.apache.org/jira/browse/PHOENIX-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067016#comment-15067016 ] maghamravikiran commented on PHOENIX-2540: -- [~warwithin] A quick test against the 4.6.0 v confirms that duplicate columns aren't allowed in the DDL and an exception is thrown. {code} @Test public void testDuplicateColumnNames() throws Exception { String ddl = "create table IF NOT EXISTS TEST_DUP_COLUMNS (" + " id char(1) NOT NULL," + " col1 integer NOT NULL," + " col2 integer," + " col2 integer, " + " CONSTRAINT NAME_PK PRIMARY KEY (id, col1)" + " )"; Connection conn = DriverManager.getConnection(getUrl()); try { conn.createStatement().execute(ddl); fail(" Duplicate column col2 exists in the ddl"); } catch(SQLException sqle) { assertEquals(SQLExceptionCode.COLUMN_EXIST_IN_DEF.getErrorCode(),sqle.getErrorCode()); } } {code} > Same column twice in CREATE TABLE leads unusable state of the table > --- > > Key: PHOENIX-2540 > URL: https://issues.apache.org/jira/browse/PHOENIX-2540 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.6.0 > Environment: Phoenix 4.6 and current master branch / HBase 1.1.2 >Reporter: YoungWoo Kim > Fix For: 4.7.0 > > > If users define same column twice in a table, the table would be unusable. > when I try to drop the table, I got ArrayIndexOutOfBoundsException as > following. To prevent this, CREATE TABLE should check duplicated columns. > E.g., > CREATE TABLE tbl (a varchar not null primary key, b bigint, b bigint, c date); > This DDL works without an error but It should be failed because column 'b' is > defined twice. > {noformat} > 2015-12-18 12:11:52,171 ERROR > [B.defaultRpcServer.handler=46,queue=4,port=16020] > coprocessor.MetaDataEndpointImpl: dropTable failed > java.lang.ArrayIndexOutOfBoundsException: 20 > at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:380) > at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:301) > at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:290) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:844) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:472) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doDropTable(MetaDataEndpointImpl.java:1450) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.dropTable(MetaDataEndpointImpl.java:1403) > at > org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11629) > at > org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails
[ https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2538: - Attachment: PHOENIX-2538-1.patch > CsvBulkLoadTool should return non-zero exit status if import fails > -- > > Key: PHOENIX-2538 > URL: https://issues.apache.org/jira/browse/PHOENIX-2538 > Project: Phoenix > Issue Type: Bug >Reporter: Gabriel Reid >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2538-1.patch, PHOENIX-2538.patch > > > The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it > does not correctly return a non-zero error code if the import job fails. This > makes it impossible for users of the tool to automatically determine if the > tool failed (e.g. when running it from shell scripts). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails
[ https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065915#comment-15065915 ] maghamravikiran commented on PHOENIX-2538: -- [~gabriel.reid] Thanks for the review. I have reverted my change on removing the status of delete in the new patch attached. I did a cross check with an old version of CsvBulkLoadTool.java and the import order matches with the one i have . :) [1] https://github.com/apache/phoenix/blob/4.3/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvBulkLoadTool.java > CsvBulkLoadTool should return non-zero exit status if import fails > -- > > Key: PHOENIX-2538 > URL: https://issues.apache.org/jira/browse/PHOENIX-2538 > Project: Phoenix > Issue Type: Bug >Reporter: Gabriel Reid >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2538-1.patch, PHOENIX-2538.patch > > > The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it > does not correctly return a non-zero error code if the import job fails. This > makes it impossible for users of the tool to automatically determine if the > tool failed (e.g. when running it from shell scripts). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails
[ https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran reassigned PHOENIX-2538: Assignee: maghamravikiran > CsvBulkLoadTool should return non-zero exit status if import fails > -- > > Key: PHOENIX-2538 > URL: https://issues.apache.org/jira/browse/PHOENIX-2538 > Project: Phoenix > Issue Type: Bug >Reporter: Gabriel Reid >Assignee: maghamravikiran > Fix For: 4.7.0 > > > The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it > does not correctly return a non-zero error code if the import job fails. This > makes it impossible for users of the tool to automatically determine if the > tool failed (e.g. when running it from shell scripts). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails
[ https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2538: - Attachment: PHOENIX-2538.patch > CsvBulkLoadTool should return non-zero exit status if import fails > -- > > Key: PHOENIX-2538 > URL: https://issues.apache.org/jira/browse/PHOENIX-2538 > Project: Phoenix > Issue Type: Bug >Reporter: Gabriel Reid >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2538.patch > > > The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it > does not correctly return a non-zero error code if the import job fails. This > makes it impossible for users of the tool to automatically determine if the > tool failed (e.g. when running it from shell scripts). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails
[ https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065619#comment-15065619 ] maghamravikiran commented on PHOENIX-2538: -- [~gabriel.reid] , [~jamestaylor] Can you please review the patch. > CsvBulkLoadTool should return non-zero exit status if import fails > -- > > Key: PHOENIX-2538 > URL: https://issues.apache.org/jira/browse/PHOENIX-2538 > Project: Phoenix > Issue Type: Bug >Reporter: Gabriel Reid >Assignee: maghamravikiran > Fix For: 4.7.0 > > Attachments: PHOENIX-2538.patch > > > The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it > does not correctly return a non-zero error code if the import job fails. This > makes it impossible for users of the tool to automatically determine if the > tool failed (e.g. when running it from shell scripts). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2367) Change PhoenixRecordWriter to use execute instead of executeBatch
[ https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059258#comment-15059258 ] maghamravikiran commented on PHOENIX-2367: -- The changes look good. For better visibility, it would be great to separate the changes for ReserveNSequence.java from the patch. > Change PhoenixRecordWriter to use execute instead of executeBatch > - > > Key: PHOENIX-2367 > URL: https://issues.apache.org/jira/browse/PHOENIX-2367 > Project: Phoenix > Issue Type: Improvement >Reporter: Siddhi Mehta >Assignee: Siddhi Mehta > Fix For: 4.7.0 > > Attachments: PHOENIX-2367.patch > > > Hey All, > I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage > similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I > did some work on the apache pig code that allows Storers to have a notion of > Customizable/Configurable Errors PIG-4704. > I wanted to plug this behavior for PhoenixHbaseStorage and propose certain > changes for the same. > Current Behavior/Problem: > PhoenixRecordWriter makes use of executeBatch() to process rows once batch > size is reached. If there are any client side validation/syntactical errors > like data not fitting the column size, executeBatch() throws an exception and > there is no-way to retrieve the valid rows from the batch and retry them. We > discard the whole batch or fail the job without errorhandling. > With auto commit set to false execute() also servers the purpose of not > making any rpc calls but does a bunch of validation client side and adds it > to the client cache of mutation. > On conn.commit() we make a rpc call. > Proposed Change > To be able to use Configurable ErrorHandling and ignore only the failed > records instead of discarding the whole batch I want to propose changing the > behavior in PhoenixRecordWriter from execute to executeBatch() or having a > configuration to toggle between the 2 behaviors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2429) PhoenixConfigurationUtil.CURRENT_SCN_VALUE for phoenix-spark plugin does not work
[ https://issues.apache.org/jira/browse/PHOENIX-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011667#comment-15011667 ] maghamravikiran commented on PHOENIX-2429: -- Apparently, the property "phoenix.mr.currentscn.value" is a key that users can set before the job starts. We are currently using it in [1]. I agree , we could rather use 'CurrentSCN' . Curious, would there be any issues if users wanted to use the CurrentSCN only for the InputConnection during SELECT query but not for the OutputConnection for an UPSERT query. By setting this value in Configuration, all Connection objects will respect the SCN property. [1] https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/PhoenixIndexImportMapper.java#L75 > PhoenixConfigurationUtil.CURRENT_SCN_VALUE for phoenix-spark plugin does not > work > - > > Key: PHOENIX-2429 > URL: https://issues.apache.org/jira/browse/PHOENIX-2429 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.2.0, 4.6.0 >Reporter: Diego Fustes Villadóniga > > When I call the method saveToPhoenix to store the contents of a ProductDD, > passing a hadoop configuration, where I set > PhoenixConfigurationUtil.CURRENT_SCN_VALUE to be a given timestamp, the > values are not stored with such timestamp, but using the server time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2427) Phoenix-Pig tests fails due to timeout
[ https://issues.apache.org/jira/browse/PHOENIX-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012059#comment-15012059 ] maghamravikiran commented on PHOENIX-2427: -- Sure [~mujtabachohan] Will look into now. > Phoenix-Pig tests fails due to timeout > --- > > Key: PHOENIX-2427 > URL: https://issues.apache.org/jira/browse/PHOENIX-2427 > Project: Phoenix > Issue Type: Bug >Reporter: Mujtaba Chohan >Assignee: Mujtaba Chohan >Priority: Minor > > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-failsafe-plugin:2.19:verify > (ClientManagedTimeTests) on project phoenix-pig: There was a timeout or other > error in the fork -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-failsafe-plugin:2.19:verify > (ClientManagedTimeTests) on project phoenix-pig: There was a timeout or other > error in the fork > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320) > at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156) > at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537) > at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196) > at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230) > at > org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409) > at > org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2373) Change ReserveNSequence Udf to take in zookeeper and tentantId as param
[ https://issues.apache.org/jira/browse/PHOENIX-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995404#comment-14995404 ] maghamravikiran commented on PHOENIX-2373: -- Yes. Thanks [~siddhimehta] ! > Change ReserveNSequence Udf to take in zookeeper and tentantId as param > --- > > Key: PHOENIX-2373 > URL: https://issues.apache.org/jira/browse/PHOENIX-2373 > Project: Phoenix > Issue Type: Improvement >Reporter: Siddhi Mehta >Assignee: Siddhi Mehta >Priority: Minor > Attachments: PHOENIX-2373.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently the UDF reads zookeeper quorum for tuple value and tenantId is > passed in from the jobConf. > Instead wanted to make a change for the UDF to take both zookeeper quorum and > tenantId as params passed to the UDF explicitly -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-763) Support for Sqoop
[ https://issues.apache.org/jira/browse/PHOENIX-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985504#comment-14985504 ] maghamravikiran commented on PHOENIX-763: - Support for integrating Sqoop and Phoenix can be tracked through https://issues.apache.org/jira/browse/SQOOP-2649. Currently, the patch is available for 1.4.6 v of Sqoop. Ex. 1. sqoop import --connect jdbc:mysql://localhost/test --username root -P --verbose --table employee --phoenix-table EMP 2. sqoop import --connect jdbc:mysql://localhost/test --username root -P --verbose --query "SELECT id AS ID,name AS NAME FROM employee WHERE \$CONDITIONS" --target-dir /tmp/employee --phoenix-table EMP 3. sqoop import --connect jdbc:mysql://localhost/test --username root -P --verbose --query "SELECT rowid,name FROM employee WHERE \$CONDITIONS" --target-dir /tmp/employee --phoenix-table EMP --phoenix-column-mapping "rowid;ID,name;NAME" 4. sqoop import --connect jdbc:mysql://localhost/test --username root -P --verbose --query "SELECT rowid,name FROM employee WHERE \$CONDITIONS" --target-dir /tmp/employee --phoenix-table EMP --phoenix-column-mapping "rowid;ID,name;NAME" --phoenix-bulkload Arguments: --phoenix-table : Required . The phoenix table --phoenix-column-mapping: Optional. This property should be specified if the column names between sqoop table and phoenix table differ. --phoenix-bulkload Optional . Bulk loads data onto the phoenix table. > Support for Sqoop > - > > Key: PHOENIX-763 > URL: https://issues.apache.org/jira/browse/PHOENIX-763 > Project: Phoenix > Issue Type: Task >Affects Versions: 3.0.0 >Reporter: James Taylor >Assignee: mravi > Labels: patch > Attachments: PHOENIX-763.patch > > > Not sure anything required from our end, but you should be able to use Sqoop > to create and populate Phoenix tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-2216. -- Resolution: Fixed Fix Version/s: 4.6.0 > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Fix For: 4.6.0 > > Attachments: mhfileoutput-final.patch, > phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, > phoenix-tests-split-on.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: mhfileoutput-final.patch This patch is a minor rework to the earlier ones with the addition to avoid running CSV Bulk load for local indexes. PHOENIX-2334 will track the fix for local index. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: mhfileoutput-final.patch, > phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, > phoenix-tests-split-on.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962167#comment-14962167 ] maghamravikiran edited comment on PHOENIX-2216 at 10/18/15 3:55 AM: This patch is a minor rework to the earlier ones with the addition to avoid running CSV Bulk load for local indexes. PHOENIX-2334 will track the fix for local index. [~jamestaylor] [~gabriel.reid] If possible, can you please take a look. was (Author: maghamraviki...@gmail.com): This patch is a minor rework to the earlier ones with the addition to avoid running CSV Bulk load for local indexes. PHOENIX-2334 will track the fix for local index. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: mhfileoutput-final.patch, > phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, > phoenix-tests-split-on.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-2334) CSV Bulk load fails on local indexes
maghamravikiran created PHOENIX-2334: Summary: CSV Bulk load fails on local indexes Key: PHOENIX-2334 URL: https://issues.apache.org/jira/browse/PHOENIX-2334 Project: Phoenix Issue Type: Bug Reporter: maghamravikiran Assignee: Rajeshbabu Chintaguntla CSV Bulk load fails on local indexes. A quick test for this is {code} @Test public void testImportWithLocalIndex() throws Exception { Statement stmt = conn.createStatement(); stmt.execute("CREATE TABLE TABLE6 (ID INTEGER NOT NULL PRIMARY KEY, " + "FIRST_NAME VARCHAR, LAST_NAME VARCHAR) SPLIt ON (1,2)"); String ddl = "CREATE LOCAL INDEX TABLE6_IDX ON TABLE6 " + " (FIRST_NAME ASC)"; stmt.execute(ddl); FileSystem fs = FileSystem.get(hbaseTestUtil.getConfiguration()); FSDataOutputStream outputStream = fs.create(new Path("/tmp/input3.csv")); PrintWriter printWriter = new PrintWriter(outputStream); printWriter.println("1,FirstName 1,LastName 1"); printWriter.println("2,FirstName 2,LastName 2"); printWriter.close(); CsvBulkLoadTool csvBulkLoadTool = new CsvBulkLoadTool(); csvBulkLoadTool.setConf(hbaseTestUtil.getConfiguration()); int exitCode = csvBulkLoadTool.run(new String[] { "--input", "/tmp/input3.csv", "--table", "table6", "--zookeeper", zkQuorum}); assertEquals(0, exitCode); ResultSet rs = stmt.executeQuery("SELECT id, FIRST_NAME FROM TABLE6 where first_name='FirstName 2'"); assertTrue(rs.next()); assertEquals(2, rs.getInt(1)); assertEquals("FirstName 2", rs.getString(2)); rs.close(); stmt.close(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: phoenix-tests-split-on.patch > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, > phoenix-tests-split-on.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961612#comment-14961612 ] maghamravikiran commented on PHOENIX-2216: -- The tests which involve local indexes fail when the pre-split option is specified . I have attached the test case . [~jamestaylor] Currently, I have used a custom Writable class (CsvTableRowkeyPair) . To get it onto HBase, I feel we should stick ImmutableByteWritable as the Reducer output key. Also, we would need the delimiter for the table name and rowkey to be passed on as a configuration parameter. This way, we can write the parsing the of reducer output key for the table and rowkey and construct the necessary output path. Let me know if this sounds reasonable. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961648#comment-14961648 ] maghamravikiran edited comment on PHOENIX-2216 at 10/17/15 2:00 AM: Forgot to mention. The tests with local indexes fail on the master branch also. was (Author: maghamraviki...@gmail.com): Forgot to mention. The tests with local indexes fail on the master branch also. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, > phoenix-tests-split-on.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961648#comment-14961648 ] maghamravikiran commented on PHOENIX-2216: -- Forgot to mention. The tests with local indexes fail on the master branch also. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, > phoenix-tests-split-on.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961648#comment-14961648 ] maghamravikiran edited comment on PHOENIX-2216 at 10/17/15 2:00 AM: Forgot to mention. The tests for local indexes fail on the master branch when I add SPLIT ON . was (Author: maghamraviki...@gmail.com): Forgot to mention. The tests with local indexes fail on the master branch also. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, > phoenix-tests-split-on.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957259#comment-14957259 ] maghamravikiran commented on PHOENIX-2216: -- [~gabriel.reid] I have attached a new patch with the comments explicitly specifying where the code was changed in the RecordWriter of the custom MultiHfileOutputFormat. You will see comments like phoenix-2216: start and phoenix-2216: end . Regarding the phoenix-multipleoutputs.patch, I believe I uploaded a wrong patch. The tests testBasicImport and testFullOptionImport should work. I will send across the patch soon. Sorry on that. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: phoenix-custom-hfileoutputformat-comments.patch > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: (was: phoenix-multipleoutputs.patch) > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: phoenix-multipleoutputs.patch > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat-comments.patch, > phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: phoenix-custom-hfileoutputformat.patch > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat.patch, > phoenix-multipleoutputs.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: phoenix-multipleoutputs.patch > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat.patch, > phoenix-multipleoutputs.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: (was: 2216-wip.patch) > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat.patch, > phoenix-multipleoutputs.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954608#comment-14954608 ] maghamravikiran commented on PHOENIX-2216: -- [~gabriel.reid], [~jamestaylor] I have attached two patch files using different approaches. a) HFileMultioutputFormat: [phoenix-custom-hfileoutputformat.patch] Most of the code is copied over from HFileOutputformat with minor tweaks to write the data to different directories based on table and family name. All the integration tests work successfully. :) b) MultipleOutputs: [phoenix-multipleoutputs.patch] The plan is to use MultipleOutputs with HFileOutputFormat2 as the OutputFormat from the Reducer . Tests which involve a single table bulk load works but when we have multiple tables, tests keep failing. If the HFileOutputFormat produces the necessary files under the configured job outputpath, it works. However, for bulk loads of multiple tables , tests fail. Please let me know which of the two approaches should we follow. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: phoenix-custom-hfileoutputformat.patch, > phoenix-multipleoutputs.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-2216: - Attachment: 2216-wip.patch > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > Attachments: 2216-wip.patch > > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944135#comment-14944135 ] maghamravikiran commented on PHOENIX-2216: -- [~jamestaylor] ,[~gabriel.reid] I have a patch attached which shows the current state of my work. I am yet validate if it works correctly. > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-1999) Phoenix Pig Loader does not return data when selecting from multiple tables in a query with a join
[ https://issues.apache.org/jira/browse/PHOENIX-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-1999. -- Resolution: Invalid > Phoenix Pig Loader does not return data when selecting from multiple tables > in a query with a join > -- > > Key: PHOENIX-1999 > URL: https://issues.apache.org/jira/browse/PHOENIX-1999 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.1.0 > Environment: Pig 0.14.3, Hadoop 2.5.2 >Reporter: Seth Brogan >Assignee: maghamravikiran > > The Phoenix Pig Loader does not return data in Pig when selecting specific > columns from multiple tables in a join query. > Example: > {code} > DESCRIBE my_table; > my_table: {a: chararray, my_id: chararray} > DUMP my_table; > (abc, 123) > DESCRIBE join_table; > join_table: {x: chararray, my_id: chararray} > DUMP join_table; > (xyz, 123) > A = LOAD 'hbase://query/SELECT "t1"."a", "t2"."x" FROM "my_table" AS "t1" > JOIN "join_table" AS "t2" ON "t1"."my_id" = "t2"."my_id"' using > org.apache.phoenix.pig.PhoenixHBaseLoader('localhost'); > DUMP A; > (,) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (PHOENIX-1999) Phoenix Pig Loader does not return data when selecting from multiple tables in a query with a join
[ https://issues.apache.org/jira/browse/PHOENIX-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran closed PHOENIX-1999. > Phoenix Pig Loader does not return data when selecting from multiple tables > in a query with a join > -- > > Key: PHOENIX-1999 > URL: https://issues.apache.org/jira/browse/PHOENIX-1999 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.1.0 > Environment: Pig 0.14.3, Hadoop 2.5.2 >Reporter: Seth Brogan >Assignee: maghamravikiran > > The Phoenix Pig Loader does not return data in Pig when selecting specific > columns from multiple tables in a join query. > Example: > {code} > DESCRIBE my_table; > my_table: {a: chararray, my_id: chararray} > DUMP my_table; > (abc, 123) > DESCRIBE join_table; > join_table: {x: chararray, my_id: chararray} > DUMP join_table; > (xyz, 123) > A = LOAD 'hbase://query/SELECT "t1"."a", "t2"."x" FROM "my_table" AS "t1" > JOIN "join_table" AS "t2" ON "t1"."my_id" = "t2"."my_id"' using > org.apache.phoenix.pig.PhoenixHBaseLoader('localhost'); > DUMP A; > (,) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-1031) Compile query only once for Pig loader
[ https://issues.apache.org/jira/browse/PHOENIX-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-1031. -- Resolution: Won't Fix > Compile query only once for Pig loader > -- > > Key: PHOENIX-1031 > URL: https://issues.apache.org/jira/browse/PHOENIX-1031 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran >Priority: Minor > > I noticed that the query is compiled a few times in the Pig loader. We > should, if possible, compile it once and hold on to the QueryPlan instead of > compiling it multiple times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (PHOENIX-1031) Compile query only once for Pig loader
[ https://issues.apache.org/jira/browse/PHOENIX-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran closed PHOENIX-1031. > Compile query only once for Pig loader > -- > > Key: PHOENIX-1031 > URL: https://issues.apache.org/jira/browse/PHOENIX-1031 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran >Priority: Minor > > I noticed that the query is compiled a few times in the Pig loader. We > should, if possible, compile it once and hold on to the QueryPlan instead of > compiling it multiple times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-2036) PhoenixConfigurationUtil should provide a pre-normalize table name to PhoenixRuntime
[ https://issues.apache.org/jira/browse/PHOENIX-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-2036. -- Resolution: Fixed Folks, I would like to close this ticket as the patches for it has already been pushed to master and 4.4. and 4.x branches. [~danmeany] kindly open a new issue if you are seeing issues in spark dataframes . > PhoenixConfigurationUtil should provide a pre-normalize table name to > PhoenixRuntime > > > Key: PHOENIX-2036 > URL: https://issues.apache.org/jira/browse/PHOENIX-2036 > Project: Phoenix > Issue Type: Bug >Reporter: Siddhi Mehta >Assignee: maghamravikiran >Priority: Minor > Attachments: PHOENIX-2036-spark-v2.patch, PHOENIX-2036-spark.patch, > PHOENIX-2036-v1.patch, PHOENIX-2036-v1.patch, PHOENIX-2036-v2.patch, > PHOENIX-2036.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > I was trying a basic store using PhoenixHBaseStorage and ran into some issues > with it complaining about TableNotFoundException. > The table(CUSTOM_ENTITY."z02") in question exists. > Looking at the stacktrace I think its likely related to the change in > PHOENIX-1682 where phoenix runtime expects a pre-normalized table name. > We need to update > PhoenixConfigurationUtil.getSelectColumnMetadataList(Configuration) be pass a > pre-normalized table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-1464) IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when accessed from Pig
[ https://issues.apache.org/jira/browse/PHOENIX-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-1464. -- Resolution: Invalid A simple test was written to prove that PigLoader is able to pull UNSIGNED_FLOAT type column. Hence closing this. > IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when > accessed from Pig > -- > > Key: PHOENIX-1464 > URL: https://issues.apache.org/jira/browse/PHOENIX-1464 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: maghamravikiran >Assignee: maghamravikiran > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1464) IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when accessed from Pig
[ https://issues.apache.org/jira/browse/PHOENIX-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran updated PHOENIX-1464: - Attachment: PHOENIX-1464-test-case.patch > IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when > accessed from Pig > -- > > Key: PHOENIX-1464 > URL: https://issues.apache.org/jira/browse/PHOENIX-1464 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: maghamravikiran >Assignee: maghamravikiran > Attachments: PHOENIX-1464-test-case.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (PHOENIX-1464) IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when accessed from Pig
[ https://issues.apache.org/jira/browse/PHOENIX-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran closed PHOENIX-1464. > IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when > accessed from Pig > -- > > Key: PHOENIX-1464 > URL: https://issues.apache.org/jira/browse/PHOENIX-1464 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: maghamravikiran >Assignee: maghamravikiran > Attachments: PHOENIX-1464-test-case.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2298) Problem storing with pig on a salted table
[ https://issues.apache.org/jira/browse/PHOENIX-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939840#comment-14939840 ] maghamravikiran commented on PHOENIX-2298: -- This issue was fixed in PHOENIX-2181. Can you share the Phoenix version you are using. > Problem storing with pig on a salted table > -- > > Key: PHOENIX-2298 > URL: https://issues.apache.org/jira/browse/PHOENIX-2298 > Project: Phoenix > Issue Type: Bug >Reporter: Guillaume salou > > When I try to upsert via pigStorage on a salted table I get this error. > Store ... using org.apache.phoenix.pig.PhoenixHBaseStorage(); > first field of the table : > CurrentTime() asINTERNALTS:datetime, > This date is not used in the primary key of the table. > Works perfectly on a non salted table. > Caused by: java.lang.RuntimeException: Unable to process column _SALT:BINARY, > innerMessage=org.apache.phoenix.schema.TypeMismatchException: ERROR 203 > (22005): Type mismatch. BINARY cannot be coerced to DATE > at > org.apache.phoenix.pig.writable.PhoenixPigDBWritable.write(PhoenixPigDBWritable.java:66) > at > org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:78) > at > org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:39) > at > org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:182) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.phoenix.schema.ConstraintViolationException: > org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type > mismatch. BINARY cannot be coerced to DATE > at > org.apache.phoenix.schema.types.PDataType.throwConstraintViolationException(PDataType.java:282) > at org.apache.phoenix.schema.types.PDate.toObject(PDate.java:77) > at > org.apache.phoenix.pig.util.TypeUtil.castPigTypeToPhoenix(TypeUtil.java:208) > at > org.apache.phoenix.pig.writable.PhoenixPigDBWritable.convertTypeSpecificValue(PhoenixPigDBWritable.java:79) > at > org.apache.phoenix.pig.writable.PhoenixPigDBWritable.write(PhoenixPigDBWritable.java:59) > ... 21 more > Caused by: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 > (22005): Type mismatch. BINARY cannot be coerced to DATE > at > org.apache.phoenix.exception.SQLExceptionCode$1.newException(SQLExceptionCode.java:68) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:133) > ... 26 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2287) Spark Plugin Exception - java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row
[ https://issues.apache.org/jira/browse/PHOENIX-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909415#comment-14909415 ] maghamravikiran commented on PHOENIX-2287: -- [~jmahonin] The patch looks good. For the DecimalType, would it be ideal to explicitly specify the precision and scale to the defaults as that will ensure the phoenix-spark module works with prior versions of spark. I notice SYSTEM_DEFAULT has been added in 1.5.0 v only. > Spark Plugin Exception - java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to > org.apache.spark.sql.Row > - > > Key: PHOENIX-2287 > URL: https://issues.apache.org/jira/browse/PHOENIX-2287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.5.2 > Environment: - HBase 1.1.1 running in standalone mode on OS X > - Spark 1.5.0 > - Phoenix 4.5.2 >Reporter: Babar Tareen >Assignee: Josh Mahonin > Attachments: PHOENIX-2287.patch > > > Running the DataFrame example on Spark Plugin page > (https://phoenix.apache.org/phoenix_spark.html) results in following > exception. The same code works as expected with Spark 1.4.1. > {code:java} > import org.apache.spark.SparkContext > import org.apache.spark.sql.SQLContext > import org.apache.phoenix.spark._ > val sc = new SparkContext("local", "phoenix-test") > val sqlContext = new SQLContext(sc) > val df = sqlContext.load( > "org.apache.phoenix.spark", > Map("table" -> "TABLE1", "zkUrl" -> "127.0.0.1:2181") > ) > df > .filter(df("COL1") === "test_row_1" && df("ID") === 1L) > .select(df("ID")) > .show > {code} > Exception > {quote} > java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to > org.apache.spark.sql.Row > at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:439) > ~[spark-sql_2.11-1.5.0.jar:1.5.0] > at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) > ~[scala-library-2.11.4.jar:na] > at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) > ~[scala-library-2.11.4.jar:na] > at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) > ~[scala-library-2.11.4.jar:na] > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366) > ~[spark-sql_2.11-1.5.0.jar:1.5.0] > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622) > ~[spark-sql_2.11-1.5.0.jar:1.5.0] > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110) > ~[spark-sql_2.11-1.5.0.jar:1.5.0] > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) > ~[spark-sql_2.11-1.5.0.jar:1.5.0] > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) > ~[spark-sql_2.11-1.5.0.jar:1.5.0] > at > org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:88) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > ~[spark-core_2.11-1.5.0.jar:1.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2196) phoenix-spark should automatically convert DataFrame field names to all caps
[ https://issues.apache.org/jira/browse/PHOENIX-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908308#comment-14908308 ] maghamravikiran commented on PHOENIX-2196: -- [~jmahonin] The patch looks good. +1. > phoenix-spark should automatically convert DataFrame field names to all caps > > > Key: PHOENIX-2196 > URL: https://issues.apache.org/jira/browse/PHOENIX-2196 > Project: Phoenix > Issue Type: Improvement >Reporter: Randy Gelhausen >Assignee: Josh Mahonin >Priority: Minor > Attachments: PHOENIX-2196-v2.patch, PHOENIX-2196.patch > > > phoenix-spark will fail to save a DF into a Phoenix table if the DataFrame's > fields are not all capitalized. Since Phoenix internally capitalizes all > column names, the DataFrame.save method should automatically capitalize DF > field names as a convenience to the end user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes
[ https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran reassigned PHOENIX-2216: Assignee: maghamravikiran > Support single mapper pass to CSV bulk load table and indexes > - > > Key: PHOENIX-2216 > URL: https://issues.apache.org/jira/browse/PHOENIX-2216 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: maghamravikiran > > Instead of running separate MR jobs for CSV bulk load: once for the table and > then once for each secondary index, generate both the data table HFiles and > the index table(s) HFiles in one mapper phase. > Not sure if we need HBASE-3727 to be implemented for this or if we can do it > with existing HBase APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2231) Support CREATE/DROP SEQUENCE in Phoenix/Calcite Integration
[ https://issues.apache.org/jira/browse/PHOENIX-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736221#comment-14736221 ] maghamravikiran commented on PHOENIX-2231: -- [~maryannxue] Sure. > Support CREATE/DROP SEQUENCE in Phoenix/Calcite Integration > --- > > Key: PHOENIX-2231 > URL: https://issues.apache.org/jira/browse/PHOENIX-2231 > Project: Phoenix > Issue Type: Sub-task >Reporter: Maryann Xue > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2200) Can phoenix support mapreduce with secure hbase(kerberos)?
[ https://issues.apache.org/jira/browse/PHOENIX-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717941#comment-14717941 ] maghamravikiran commented on PHOENIX-2200: -- [~scootli] Can you try adding this one statement before submitting the job and see if it helps. {code} TableMapReduceUtil.initCredentials(job) {code} Can phoenix support mapreduce with secure hbase(kerberos)? --- Key: PHOENIX-2200 URL: https://issues.apache.org/jira/browse/PHOENIX-2200 Project: Phoenix Issue Type: Bug Environment: hbase-0.98.12.1-hadoop2phoenix-4.5.0-HBase-0.98-bin Reporter: lihuaqing I can not work with phoenix mapreduce kerberos. My codes is as followings: final Configuration configuration = HBaseConfiguration.create(); configuration.set(hbase.security.authentication,kerberos); configuration.set(hadoop.security.authentication, kerberos); configuration.set(hbase.master.kerberos.principal,hbase/_HOST@DATA.SCLOUD); configuration.set(hbase.regionserver.kerberos.principal,hbase/_HOST@DATA.SCLOUD); configuration.set(QueryServices.HBASE_CLIENT_PRINCIPAL,***); configuration.set(QueryServices.HBASE_CLIENT_KEYTAB,***); final Job job = Job.getInstance(configuration, phoenix-mr-job); // We can either specify a selectQuery or ignore it when we would like to retrieve all the columns final String selectQuery = SELECT STOCK_NAME,RECORDING_YEAR,RECORDINGS_QUARTER FROM STOCK ; // StockWritable is the DBWritable class that enables us to process the Result of the above query PhoenixMapReduceUtil.setInput(job, StockWritable.class, STOCK, selectQuery); // Set the target Phoenix table and the columns PhoenixMapReduceUtil.setOutput(job, STOCK_STATS, STOCK_NAME,MAX_RECORDING); job.setMapperClass(StockMapper.class); job.setReducerClass(StockReducer.class); job.setOutputFormatClass(PhoenixOutputFormat.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DoubleWritable.class); job.setOutputKeyClass(NullWritable.class); job.setOutputValueClass(StockWritable.class); TableMapReduceUtil.addDependencyJars(job); job.waitForCompletion(true); I get the error statck as following: 2015-08-24 12:12:15,767 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.sql.SQLException: ERROR 103 (08004): Unable to establish connection. at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:125) at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:69) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to establish connection. at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:388) at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:297) at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:180) at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1901) at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1880) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77) at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1880) at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180) at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132) at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151) at java.sql.DriverManager.getConnection(DriverManager.java:579) at java.sql.DriverManager.getConnection(DriverManager.java:190) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:93) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57) at
[jira] [Commented] (PHOENIX-2196) phoenix-spark should automatically convert DataFrame field names to all caps
[ https://issues.apache.org/jira/browse/PHOENIX-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711444#comment-14711444 ] maghamravikiran commented on PHOENIX-2196: -- Thanks [~jmahonin] [~rgelhau] for the work. +1 for the changes. phoenix-spark should automatically convert DataFrame field names to all caps Key: PHOENIX-2196 URL: https://issues.apache.org/jira/browse/PHOENIX-2196 Project: Phoenix Issue Type: Improvement Reporter: Randy Gelhausen Assignee: Josh Mahonin Priority: Minor Attachments: PHOENIX-2196.patch phoenix-spark will fail to save a DF into a Phoenix table if the DataFrame's fields are not all capitalized. Since Phoenix internally capitalizes all column names, the DataFrame.save method should automatically capitalize DF field names as a convenience to the end user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2116) phoenix-flume: Sink/Serializer should be extendable
[ https://issues.apache.org/jira/browse/PHOENIX-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711440#comment-14711440 ] maghamravikiran commented on PHOENIX-2116: -- +1 to the patch. phoenix-flume: Sink/Serializer should be extendable --- Key: PHOENIX-2116 URL: https://issues.apache.org/jira/browse/PHOENIX-2116 Project: Phoenix Issue Type: Improvement Affects Versions: 4.5.0, 4.4.1 Reporter: Josh Mahonin Assignee: Josh Mahonin Attachments: PHOENIX-2116-v2.patch, PHOENIX-2116.patch When using flume, often times custom serializers are necessary to transform data before sending to a sink. The existing Phoenix implementation however makes it difficult to extend and add new functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2031) Unable to process timestamp/Date data loaded via Phoenix org.apache.phoenix.pig.PhoenixHBaseLoader
[ https://issues.apache.org/jira/browse/PHOENIX-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708113#comment-14708113 ] maghamravikiran commented on PHOENIX-2031: -- The latest build was successful. https://builds.apache.org/job/Phoenix-master/877/ . Hence closing this . Unable to process timestamp/Date data loaded via Phoenix org.apache.phoenix.pig.PhoenixHBaseLoader -- Key: PHOENIX-2031 URL: https://issues.apache.org/jira/browse/PHOENIX-2031 Project: Phoenix Issue Type: Bug Reporter: Alicia Ying Shu Assignee: Alicia Ying Shu Attachments: PHOENIX-2031-v1.patch, PHOENIX-2031-v2.patch, PHOENIX-2031.patch 2015-05-11 15:41:44,419 WARN main org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.pig.PigException: ERROR 0: Error transforming PhoenixRecord to Tuple Cannot convert a Unknown to a java.sql.Timestamp at org.apache.phoenix.pig.util.TypeUtil.transformToTuple(TypeUtil.java:293) at org.apache.phoenix.pig.PhoenixHBaseLoader.getNext(PhoenixHBaseLoader.java:197) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-2103) Pig tests aren't dropping tables as expected between test runs
[ https://issues.apache.org/jira/browse/PHOENIX-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maghamravikiran resolved PHOENIX-2103. -- Resolution: Not A Problem Pig tests aren't dropping tables as expected between test runs -- Key: PHOENIX-2103 URL: https://issues.apache.org/jira/browse/PHOENIX-2103 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: maghamravikiran Attachments: PHOENIX-2013-tests.patch, PHOENIX-2013-v1.patch, PhoenixHBaseLoadIT.java Looks like PhoenixHBaseLoaderIT isn't derived from any of our base test classes (hence it would not drop tables between classes). It should be derived from BaseHBaseManagedTimeIT in which case it would call the @After cleanUpAfterTest() method to drop tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build
[ https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707110#comment-14707110 ] maghamravikiran commented on PHOENIX-2154: -- [~rvaleti] I believe you missed PhoenixTableOutputFormat class in the patch. I am assuming you are updating the index table state in the PhoenixTableOutputFormat class ? Failure of one mapper should not affect other mappers in MR index build --- Key: PHOENIX-2154 URL: https://issues.apache.org/jira/browse/PHOENIX-2154 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: maghamravikiran Attachments: IndexTool.java, PHOENIX-2154-WIP.patch, PHOENIX-2154-_HBase_Frontdoor_API_WIP.patch Once a mapper in the MR index job succeeds, it should not need to be re-done in the event of the failure of one of the other mappers. The initial population of an index is based on a snapshot in time, so new rows getting *after* the index build has started and/or failed do not impact it. Also, there's a 1:1 correspondence between index rows and table rows, so there's really no need to dedup. However, the index rows will have a different row key than the data table, so I'm not sure how the HFiles are split. Will they potentially overlap and is this an issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build
[ https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705319#comment-14705319 ] maghamravikiran commented on PHOENIX-2154: -- My bad. Will move it to the reduce method to be absolutely sure. Failure of one mapper should not affect other mappers in MR index build --- Key: PHOENIX-2154 URL: https://issues.apache.org/jira/browse/PHOENIX-2154 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: maghamravikiran Attachments: IndexTool.java, PHOENIX-2154-WIP.patch Once a mapper in the MR index job succeeds, it should not need to be re-done in the event of the failure of one of the other mappers. The initial population of an index is based on a snapshot in time, so new rows getting *after* the index build has started and/or failed do not impact it. Also, there's a 1:1 correspondence between index rows and table rows, so there's really no need to dedup. However, the index rows will have a different row key than the data table, so I'm not sure how the HFiles are split. Will they potentially overlap and is this an issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332)