[jira] [Commented] (PHOENIX-2784) phoenix-spark: Allow coercion of DATE fields to TIMESTAMP when loading DataFrames

2016-04-25 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256709#comment-15256709
 ] 

maghamravikiran commented on PHOENIX-2784:
--

[~jmahonin] The patch looks good.  +1 

> phoenix-spark: Allow coercion of DATE fields to TIMESTAMP when loading 
> DataFrames
> -
>
> Key: PHOENIX-2784
> URL: https://issues.apache.org/jira/browse/PHOENIX-2784
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.7.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
>Priority: Minor
> Attachments: PHOENIX-2784.patch
>
>
> The Phoenix DATE type is internally represented as an 8 bytes, which can 
> store a full '-MM-dd hh:mm:ss' time component. However, Spark SQL follows 
> the SQL Date spec and keeps only the '-MM-dd' portion as a 4 byte type. 
> When loading Phoenix DATE columns using the Spark DataFrame API, the 
> 'hh:mm:ss' component is lost.
> This patch allows setting a new 'dateAsTimestamp' option when loading a 
> DataFrame, which will coerce the underlying Date object to a Timestamp so 
> that the full time component is loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2810) Fixing IndexTool Dependencies

2016-04-07 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230219#comment-15230219
 ] 

maghamravikiran commented on PHOENIX-2810:
--

Valid [~gabriel.reid] .  The patch applies neatly only on the master and breaks 
on 4.x - HBase-1.0 and 4.x-HBase-0.98. 
Holding off merging this patch .

> Fixing IndexTool Dependencies
> -
>
> Key: PHOENIX-2810
> URL: https://issues.apache.org/jira/browse/PHOENIX-2810
> Project: Phoenix
>  Issue Type: Bug
>Reporter: churro morales
>Priority: Minor
>  Labels: HBASEDEPENDENCIES
> Attachments: PHOENIX-2810.patch
>
>
> IndexTool uses HFileOutputFormat which is deprecated.  Use HFileOutputFormat2 
> instead and fix other private dependencies for this class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2786) Can MultiTableOutputFormat be used instead of MultiHfileOutputFormat

2016-03-21 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204754#comment-15204754
 ] 

maghamravikiran commented on PHOENIX-2786:
--

[~churromorales]  From what I see, MultiTableOutputFormat uses the Put / Delete 
mutation rather than writing to HFiles that MultiHfileOutputFormat does.  We 
definitely have seen times , for ex:  for a newly created table ,  doing direct 
writes to HBase perform way better than bulk load  but in general writing to 
HFiles performs better. 
I definitely agree to your valid point that the code in MultiHfileOutputFormat 
has a lot from HfileOutputFormat except for few minor changes. 

> Can MultiTableOutputFormat be used instead of MultiHfileOutputFormat
> 
>
> Key: PHOENIX-2786
> URL: https://issues.apache.org/jira/browse/PHOENIX-2786
> Project: Phoenix
>  Issue Type: Task
>Reporter: churro morales
>
> MultiHfileOutputFormat depends on a lot of HBase classes that it shouldn't 
> depend on.  It seems like MultiHfileOutputFormat and MultiTableOutputFormat 
> have the same goal. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-418) Support approximate COUNT DISTINCT

2016-03-11 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran reassigned PHOENIX-418:
---

Assignee: maghamravikiran

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: maghamravikiran
>  Labels: gsoc2016
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-27 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170645#comment-15170645
 ] 

maghamravikiran edited comment on PHOENIX-2649 at 2/27/16 4:29 PM:
---

To me it looks like the issue is in this code snippet in [#1] where the mapper 
output key of TableRowkeyPair includes a table index and rowkey rather than 
table name and rowkey.  

While creating the partitioner path [#2]  during the job setup , we apparently 
use TableRowkeyPair which is a combination of table name and rowkey of the 
table.  
This mismatch seems to be the root cause of the issue and the 
TotalOrderPartitioner is distributing all mapper output to a single reducer 

1.  
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274
 

2. 
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707

The initial code drop of PHOENIX-2216 didn't introduce this issue. 


was (Author: maghamraviki...@gmail.com):
To me it looks like the issue is in this code snippet in [#1] where the mapper 
output key of TableRowkeyPair includes a table index and rowkey rather than 
table name and rowkey.  

While creating the partitioner path [#2]  during the job setup , we apparently 
use TableRowkeyPair which is a combination of table name and rowkey of the 
table.  
This mismatch seems to be the root cause of the issue and the 
TotalOrderPartitioner is distributing all mapper output to a single reducer 

1.  
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274
 

2. 
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, 
> PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-27 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170645#comment-15170645
 ] 

maghamravikiran commented on PHOENIX-2649:
--

To me it looks like the issue is in this code snippet in [#1] where the mapper 
output key of TableRowkeyPair includes a table index and rowkey rather than 
table name and rowkey.  

While creating the partitioner path [#2]  during the job setup , we apparently 
use TableRowkeyPair which is a combination of table name and rowkey of the 
table.  
This mismatch seems to be the root cause of the issue and the 
TotalOrderPartitioner is distributing all mapper output to a single reducer 

1.  
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/FormatToKeyValueMapper.java#L274
 

2. 
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/MultiHfileOutputFormat.java#L707

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, 
> PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2674) PhoenixMapReduceUtil#setInput doesn't honor condition clause

2016-02-11 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143972#comment-15143972
 ] 

maghamravikiran commented on PHOENIX-2674:
--

Good catch [~jesse_yates] on the missing usage of condition clause. 
 +1 for the changes. 

> PhoenixMapReduceUtil#setInput doesn't honor condition clause
> 
>
> Key: PHOENIX-2674
> URL: https://issues.apache.org/jira/browse/PHOENIX-2674
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Jesse Yates
>Assignee: Jesse Yates
> Attachments: PHOENIX-2674.patch, phoenix-2674-v0-without-test.patch
>
>
> The parameter is completely unused in the method. Further, it looks like we 
> don't actually test this method or any m/r tools directly.
> It would be good to (a) have explicit tests for the MapReduce code - rather 
> than relying on indirect tests like the index util - and, (b) have an example 
> in code for using the mapreduce tools, rather than just the web docs (which 
> can become out of date).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-04 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133205#comment-15133205
 ] 

maghamravikiran commented on PHOENIX-2649:
--

Pushed the patch.  Closing the ticket

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, 
> PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-04 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2649:
-
Assignee: Sergey Soldatov  (was: maghamravikiran)

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, 
> PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-04 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-2649.
--
Resolution: Fixed

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, 
> PHOENIX-2649-3.patch, PHOENIX-2649-4.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-04 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132702#comment-15132702
 ] 

maghamravikiran commented on PHOENIX-2649:
--

+1 for the changes. Definitely the bug was in the usage of vInt at the first 
place. 

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: maghamravikiran
>Priority: Critical
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, 
> PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-04 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133004#comment-15133004
 ] 

maghamravikiran commented on PHOENIX-2649:
--

Thanks [~sergey.soldatov] for the contribution.  

One minor nit : The static * import . One of us during checkin will address it. 
{code}
 import static org.apache.hadoop.hbase.util.Bytes.*; 
{code}

[~gabriel.reid], [~giacomotaylor] 

   Can I have a go ahead from one of you before the patch is pushed. 

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: maghamravikiran
>Priority: Critical
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649-2.patch, 
> PHOENIX-2649-3.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-03 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131130#comment-15131130
 ] 

maghamravikiran edited comment on PHOENIX-2649 at 2/3/16 9:06 PM:
--

Thanks [~gabriel.reid] [~sergey.soldatov] .   I updated the latest patch. Can 
you please review.


was (Author: maghamraviki...@gmail.com):
Uses BytesWritable.Comparator as the comparator.  

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Priority: Critical
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-03 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-2649.
--
Resolution: Fixed

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: maghamravikiran
>Priority: Critical
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-03 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131259#comment-15131259
 ] 

maghamravikiran commented on PHOENIX-2649:
--

I pushed the patch to 4.x and master branch. 

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Priority: Critical
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-03 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2649:
-
Attachment: PHOENIX-2649-1.patch

Uses BytesWritable.Comparator as the comparator.  

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Priority: Critical
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-03 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran reassigned PHOENIX-2649:


Assignee: maghamravikiran

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Assignee: maghamravikiran
>Priority: Critical
> Attachments: PHOENIX-2649-1.patch, PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-03 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130816#comment-15130816
 ] 

maghamravikiran commented on PHOENIX-2649:
--

[~sergey.soldatov] 
Are you suggesting we have just the following line or we restrict just to 
compareTo method in the class. I am not aware of the default Writable 
comparator , hence, pardon me.  
{code}
 static { 
WritableComparator.define(TableRowkeyPair.class, new 
BytesWritable.Comparator());
}
{code}

In addition, I notice the absence of hashcode() implementation which I will add 
:(. 

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Priority: Critical
> Attachments: PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-02 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2649:
-
Attachment: PHOENIX-2649.patch

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Priority: Critical
> Attachments: PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-02 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129720#comment-15129720
 ] 

maghamravikiran commented on PHOENIX-2649:
--

[~giacomotaylor], 
 No. this is not regression. The tests in the patch are failing without the 
fix. Can you please review.
  

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Priority: Critical
> Attachments: PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-02 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129742#comment-15129742
 ] 

maghamravikiran commented on PHOENIX-2649:
--

Thanks [~sergey.soldatov] for identifying the issue.  Can you please try 
applying the patch to see if the issue you are reporting is fixed. 

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Priority: Critical
> Attachments: PHOENIX-2649.patch
>
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2649) GC/OOM during BulkLoad

2016-02-02 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129679#comment-15129679
 ] 

maghamravikiran commented on PHOENIX-2649:
--

I am working on a patch for this.  Definitely , the comparator isn't working as 
expected. 

> GC/OOM during BulkLoad
> --
>
> Key: PHOENIX-2649
> URL: https://issues.apache.org/jira/browse/PHOENIX-2649
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Mac OS, Hadoop 2.7.2, HBase 1.1.2
>Reporter: Sergey Soldatov
>Priority: Critical
>
> Phoenix fails to complete  bulk load of 40Mb csv data with GC heap error 
> during Reduce phase. The problem is in the comparator for TableRowkeyPair. It 
> expects that the serialized value was written using zero-compressed encoding, 
> but at least in my case it was written in regular way. So, trying to obtain 
> length for table name and row key it always get zero and reports that those 
> byte arrays are equal. As the result, the reducer receives all data produced 
> by mappers in one reduce call and fails with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1849) MemoryLeak in PhoenixFlumePlugin PhoenixConnection

2016-01-28 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122470#comment-15122470
 ] 

maghamravikiran commented on PHOENIX-1849:
--

Sure [~jamestaylor] . I was waiting on the Jenkins results of the tests to 
finish.  Thanks for closing it.

> MemoryLeak in PhoenixFlumePlugin PhoenixConnection
> --
>
> Key: PHOENIX-1849
> URL: https://issues.apache.org/jira/browse/PHOENIX-1849
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.3.0
> Environment: JDK 1.7 Flume 1.5.2 Phoenix 4.3.0 HBase 0.98
>Reporter: PeiLiping
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-1849.patch
>
>
> I got OOME after using the PhoenixFlumePlugin to write data into HBase for 6 
> hours. It looks like the  PhoenixConnection never release the prepare 
> statements list even I call the commit method manually. 
> Now I had to close the connection after using the connection thousand times 
> and recreate a new connection later.
> This issue is caused by the statements is never be cleared, so the fix could 
> be clear the statements once the connection doesn't need them.
> Code:
> PhoenixConnection.java : 122 
> private List statements = new ArrayList();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2542) CSV bulk loading with --schema option is broken

2016-01-28 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2542:
-
Fix Version/s: 4.7.0

> CSV bulk loading with --schema option is broken
> ---
>
> Key: PHOENIX-2542
> URL: https://issues.apache.org/jira/browse/PHOENIX-2542
> Project: Phoenix
>  Issue Type: Bug
> Environment: Current master branch / HBase 1.1.2
>Reporter: YoungWoo Kim
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2542.patch
>
>
> My bulk load command looks like this:
> {code}
> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop 
> jar /usr/lib/phoenix/phoenix-client.jar 
> org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} 
> --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d 
> $'\001'
> {code}
> Got errors as following:
> {noformat}
> 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : 
> attempt_1450018293185_0952_m_04_2, Status : FAILED
> Error: java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at com.google.common.base.Throwables.propagate(Throwables.java:156)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
>   at 
> org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147)
>   ... 9 more
> Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 
> (42M03): Table undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436)
>   at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285)
>   at 
> org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249)
>   at 
> org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84)
>   ... 12 more
> {noformat}
> My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my 
> schema name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2542) CSV bulk loading with --schema option is broken

2016-01-28 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121945#comment-15121945
 ] 

maghamravikiran commented on PHOENIX-2542:
--

Closing this.

> CSV bulk loading with --schema option is broken
> ---
>
> Key: PHOENIX-2542
> URL: https://issues.apache.org/jira/browse/PHOENIX-2542
> Project: Phoenix
>  Issue Type: Bug
> Environment: Current master branch / HBase 1.1.2
>Reporter: YoungWoo Kim
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2542.patch
>
>
> My bulk load command looks like this:
> {code}
> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop 
> jar /usr/lib/phoenix/phoenix-client.jar 
> org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} 
> --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d 
> $'\001'
> {code}
> Got errors as following:
> {noformat}
> 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : 
> attempt_1450018293185_0952_m_04_2, Status : FAILED
> Error: java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at com.google.common.base.Throwables.propagate(Throwables.java:156)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
>   at 
> org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147)
>   ... 9 more
> Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 
> (42M03): Table undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436)
>   at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285)
>   at 
> org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249)
>   at 
> org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84)
>   ... 12 more
> {noformat}
> My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my 
> schema name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2542) CSV bulk loading with --schema option is broken

2016-01-28 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-2542.
--
Resolution: Fixed

> CSV bulk loading with --schema option is broken
> ---
>
> Key: PHOENIX-2542
> URL: https://issues.apache.org/jira/browse/PHOENIX-2542
> Project: Phoenix
>  Issue Type: Bug
> Environment: Current master branch / HBase 1.1.2
>Reporter: YoungWoo Kim
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2542.patch
>
>
> My bulk load command looks like this:
> {code}
> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop 
> jar /usr/lib/phoenix/phoenix-client.jar 
> org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} 
> --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d 
> $'\001'
> {code}
> Got errors as following:
> {noformat}
> 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : 
> attempt_1450018293185_0952_m_04_2, Status : FAILED
> Error: java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at com.google.common.base.Throwables.propagate(Throwables.java:156)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
>   at 
> org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147)
>   ... 9 more
> Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 
> (42M03): Table undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436)
>   at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285)
>   at 
> org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249)
>   at 
> org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84)
>   ... 12 more
> {noformat}
> My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my 
> schema name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2542) CSV bulk loading with --schema option is broken

2016-01-27 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2542:
-
Attachment: PHOENIX-2542.patch

[~jamestaylor], [~gabriel.reid]
   Can you please review the patch. 

> CSV bulk loading with --schema option is broken
> ---
>
> Key: PHOENIX-2542
> URL: https://issues.apache.org/jira/browse/PHOENIX-2542
> Project: Phoenix
>  Issue Type: Bug
> Environment: Current master branch / HBase 1.1.2
>Reporter: YoungWoo Kim
>Assignee: maghamravikiran
> Attachments: PHOENIX-2542.patch
>
>
> My bulk load command looks like this:
> {code}
> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop 
> jar /usr/lib/phoenix/phoenix-client.jar 
> org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} 
> --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d 
> $'\001'
> {code}
> Got errors as following:
> {noformat}
> 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : 
> attempt_1450018293185_0952_m_04_2, Status : FAILED
> Error: java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at com.google.common.base.Throwables.propagate(Throwables.java:156)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
>   at 
> org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147)
>   ... 9 more
> Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 
> (42M03): Table undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436)
>   at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285)
>   at 
> org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249)
>   at 
> org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84)
>   ... 12 more
> {noformat}
> My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my 
> schema name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-2542) CSV bulk loading with --schema option is broken

2016-01-27 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran reassigned PHOENIX-2542:


Assignee: maghamravikiran

> CSV bulk loading with --schema option is broken
> ---
>
> Key: PHOENIX-2542
> URL: https://issues.apache.org/jira/browse/PHOENIX-2542
> Project: Phoenix
>  Issue Type: Bug
> Environment: Current master branch / HBase 1.1.2
>Reporter: YoungWoo Kim
>Assignee: maghamravikiran
>
> My bulk load command looks like this:
> {code}
> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/etc/hbase/conf/ hadoop 
> jar /usr/lib/phoenix/phoenix-client.jar 
> org.apache.phoenix.mapreduce.CsvBulkLoadTool ${HADOOP_MR_RUNTIME_OPTS} 
> --schema MYSCHEMA --table MYTABLE --input /path/to/id=2015121800/* -d 
> $'\001'
> {code}
> Got errors as following:
> {noformat}
> 15/12/21 11:47:40 INFO mapreduce.Job: Task Id : 
> attempt_1450018293185_0952_m_04_2, Status : FAILED
> Error: java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:170)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:61)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.RuntimeException: 
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=MYTABLE
>   at com.google.common.base.Throwables.propagate(Throwables.java:156)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper$MapperUpsertListener.errorOnRecord(FormatToKeyValueMapper.java:246)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:92)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
>   at 
> org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
>   at 
> org.apache.phoenix.mapreduce.FormatToKeyValueMapper.map(FormatToKeyValueMapper.java:147)
>   ... 9 more
> Caused by: org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 
> (42M03): Table undefined. tableName=MYTABLE
>   at 
> org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:436)
>   at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.(FromCompiler.java:285)
>   at 
> org.apache.phoenix.compile.FromCompiler.getResolverForMutation(FromCompiler.java:249)
>   at 
> org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:289)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:578)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:566)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:326)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:324)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:245)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
>   at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
>   at 
> org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:84)
>   ... 12 more
> {noformat}
> My table MYSCHEMA.MYTABLE exists but bulk load tool does not recognize my 
> schema name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1849) MemoryLeak in PhoenixFlumePlugin PhoenixConnection

2016-01-25 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-1849:
-
Attachment: PHOENIX-1849.patch

[~jamestaylor] Can you please review. 

> MemoryLeak in PhoenixFlumePlugin PhoenixConnection
> --
>
> Key: PHOENIX-1849
> URL: https://issues.apache.org/jira/browse/PHOENIX-1849
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.3.0
> Environment: JDK 1.7 Flume 1.5.2 Phoenix 4.3.0 HBase 0.98
>Reporter: PeiLiping
>Assignee: maghamravikiran
> Fix For: 4.8.0
>
> Attachments: PHOENIX-1849.patch
>
>
> I got OOME after using the PhoenixFlumePlugin to write data into HBase for 6 
> hours. It looks like the  PhoenixConnection never release the prepare 
> statements list even I call the commit method manually. 
> Now I had to close the connection after using the connection thousand times 
> and recreate a new connection later.
> This issue is caused by the statements is never be cleared, so the fix could 
> be clear the statements once the connection doesn't need them.
> Code:
> PhoenixConnection.java : 122 
> private List statements = new ArrayList();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1811) Provide Java Wrappers to the Scala api in phoenix-spark module

2016-01-24 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114531#comment-15114531
 ] 

maghamravikiran commented on PHOENIX-1811:
--

[~giacomotaylor]
  Initially, I wasn't able to use the Scala api from a java program. I will 
give it a try again and if the Java wrappers aren't necessary, I will close 
this ticket.

> Provide Java Wrappers to the Scala api in phoenix-spark module
> --
>
> Key: PHOENIX-1811
> URL: https://issues.apache.org/jira/browse/PHOENIX-1811
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: maghamravikiran
>Assignee: maghamravikiran
>
> Create a Java wrapper around the Scala api that has been written as part of 
> phoenix-spark module. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1849) MemoryLeak in PhoenixFlumePlugin PhoenixConnection

2016-01-23 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114128#comment-15114128
 ] 

maghamravikiran commented on PHOENIX-1849:
--

[~jamestaylor]
I notice we don't close the PreparedStatement at 
https://github.com/apache/phoenix/blob/master/phoenix-flume/src/main/java/org/apache/phoenix/flume/serializer/RegexEventSerializer.java#L72
 .  I will work on providing a patch soon.

> MemoryLeak in PhoenixFlumePlugin PhoenixConnection
> --
>
> Key: PHOENIX-1849
> URL: https://issues.apache.org/jira/browse/PHOENIX-1849
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.3.0
> Environment: JDK 1.7 Flume 1.5.2 Phoenix 4.3.0 HBase 0.98
>Reporter: PeiLiping
>Assignee: maghamravikiran
> Fix For: 4.8.0
>
>
> I got OOME after using the PhoenixFlumePlugin to write data into HBase for 6 
> hours. It looks like the  PhoenixConnection never release the prepare 
> statements list even I call the commit method manually. 
> Now I had to close the connection after using the connection thousand times 
> and recreate a new connection later.
> This issue is caused by the statements is never be cleared, so the fix could 
> be clear the statements once the connection doesn't need them.
> Code:
> PhoenixConnection.java : 122 
> private List statements = new ArrayList();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-21 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2584:
-
Attachment: PHOENIX-2484-3.patch

Hopefully the last version of the patch :)
[~jamestaylor] , [~prkommireddi] 
  can you please review.

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2484-2.patch, PHOENIX-2484-3.patch, 
> PHOENIX-2584-1.patch, PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-21 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2584:
-
Attachment: PHOENIX-2484-2.patch

[~jamestaylor], [~prkommireddi] 
   I have made two changes to the earlier patch.
a) Determine the sql type of the column using ColumnProjector. This helps in 
cases where a column is defined as VARCHAR type but the user uses a REGEX_SPLIT 
function on that column in the SQL query passed to LOAD. This causes the 
resultant data type to be of Array type.  
 
b) Added two tests , one that tests Arrays in SQL query and the other where the 
user specifies just the table name in LOAD statement.

Can you please have a review. 

You will notice the order of imports in PhoenixRecordWritable have changed  and 
they match the order of the settings file phoenix.importorder that we have.

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2484-2.patch, PHOENIX-2584-1.patch, 
> PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-19 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108016#comment-15108016
 ] 

maghamravikiran commented on PHOENIX-2584:
--

[~jamestaylor] 
   I am still working on a fix for the issue I mentioned above. 

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-19 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106883#comment-15106883
 ] 

maghamravikiran edited comment on PHOENIX-2584 at 1/19/16 3:39 PM:
---

Thanks [~prkommireddi] for the comments.

[~jamestaylor]
   I didn't push the patch yesterday as I noticed a bug in the code. I am 
working on fixing it.  The test which fail are

{code}
 @Test
public void testTimeForSQLQuery() throws Exception {
//create the table
String ddl = "CREATE TABLE TIME_T (MYKEY VARCHAR,DATE_STP TIME 
CONSTRAINT PK PRIMARY KEY (MYKEY)) ";
conn.createStatement().execute(ddl);

final String dml = "UPSERT INTO TIME_T VALUES('foo',TO_TIME('2008-05-16 
00:30:00'))";
conn.createStatement().execute(dml);
conn.commit();

//sql query
final String sqlQuery = " SELECT mykey, minute(DATE_STP) FROM TIME_T ";
pigServer.registerQuery(String.format(
"A = load 'hbase://query/%s' using 
org.apache.phoenix.pig.PhoenixHBaseLoader('%s');", sqlQuery,
zkQuorum));

final Iterator iterator = pigServer.openIterator("A");
while (iterator.hasNext()) {
Tuple tuple = iterator.next();
assertEquals("foo", tuple.get(0));
assertEquals(30, tuple.get(1));
}
}
{code}

Here , we use a Phoenix Function minute() in the SQL Query. The code in 
PhoenixHbaseLoader makes a call(added in this patch) to fetch the ColumnInfo of 
each column in the SELECT expression and is failing to determine the data type 
for column *minute(DATE_STP)* .  I added this call to determine the exact data 
type of a Phoenix Array and use it in constructing a Pig Tuple.

{code}
 PhoenixHBaseLoader.java
  private void initializePhoenixPigConfiguration(final String location, final 
Configuration configuration) throws IOException {
 ...
 ...
 ...
// newly added call to get a List.
this.columnInfoList = 
PhoenixConfigurationUtil.getSelectColumnMetadataList(this.config);
{code} 


was (Author: maghamraviki...@gmail.com):
Thanks [~prkommireddi] for the comments.

[~jamestaylor]
   I didn't push the patch yesterday as I noticed a bug in the code. I am 
working on fixing it.  The test which fail are

{code}
 @Test
public void testTimeForSQLQuery() throws Exception {
//create the table
String ddl = "CREATE TABLE TIME_T (MYKEY VARCHAR,DATE_STP TIME 
CONSTRAINT PK PRIMARY KEY (MYKEY)) ";
conn.createStatement().execute(ddl);

final String dml = "UPSERT INTO TIME_T VALUES('foo',TO_TIME('2008-05-16 
00:30:00'))";
conn.createStatement().execute(dml);
conn.commit();

//sql query
final String sqlQuery = " SELECT mykey, minute(DATE_STP) FROM TIME_T ";
pigServer.registerQuery(String.format(
"A = load 'hbase://query/%s' using 
org.apache.phoenix.pig.PhoenixHBaseLoader('%s');", sqlQuery,
zkQuorum));

final Iterator iterator = pigServer.openIterator("A");
while (iterator.hasNext()) {
Tuple tuple = iterator.next();
assertEquals("foo", tuple.get(0));
assertEquals(30, tuple.get(1));
}
}
{code}

Here , we use a Phoenix Function minute() in the SQL Query. The code in 
PhoenixHbaseLoader makes a call(added in this patch) to fetch the ColumnInfo of 
each column in the SELECT expression and is failing to determine the data type 
for column *minute(DATE_STP)* .  I added this call to determine the exact data 
type of a Phoenix Array.

{code}
 PhoenixHBaseLoader.java
  private void initializePhoenixPigConfiguration(final String location, final 
Configuration configuration) throws IOException {
 ...
 ...
 ...
// newly added call to get a List.
this.columnInfoList = 
PhoenixConfigurationUtil.getSelectColumnMetadataList(this.config);
{code} 

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-19 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106883#comment-15106883
 ] 

maghamravikiran commented on PHOENIX-2584:
--

Thanks [~prkommireddi] for the comments.

[~jamestaylor]
   I didn't push the patch yesterday as I noticed a bug in the code. I am 
working on fixing it.  The test which fail are

{code}
 @Test
public void testTimeForSQLQuery() throws Exception {
//create the table
String ddl = "CREATE TABLE TIME_T (MYKEY VARCHAR,DATE_STP TIME 
CONSTRAINT PK PRIMARY KEY (MYKEY)) ";
conn.createStatement().execute(ddl);

final String dml = "UPSERT INTO TIME_T VALUES('foo',TO_TIME('2008-05-16 
00:30:00'))";
conn.createStatement().execute(dml);
conn.commit();

//sql query
final String sqlQuery = " SELECT mykey, minute(DATE_STP) FROM TIME_T ";
pigServer.registerQuery(String.format(
"A = load 'hbase://query/%s' using 
org.apache.phoenix.pig.PhoenixHBaseLoader('%s');", sqlQuery,
zkQuorum));

final Iterator iterator = pigServer.openIterator("A");
while (iterator.hasNext()) {
Tuple tuple = iterator.next();
assertEquals("foo", tuple.get(0));
assertEquals(30, tuple.get(1));
}
}
{code}

Here , we use a Phoenix Function minute() in the SQL Query. The code in 
PhoenixHbaseLoader makes a call(added in this patch) to fetch the ColumnInfo of 
each column in the SELECT expression and is failing to determine the data type 
for column *minute(DATE_STP)* .  I added this call to determine the exact data 
type of a Phoenix Array.

{code}
 PhoenixHBaseLoader.java
  private void initializePhoenixPigConfiguration(final String location, final 
Configuration configuration) throws IOException {
 ...
 ...
 ...
// newly added call to get a List.
this.columnInfoList = 
PhoenixConfigurationUtil.getSelectColumnMetadataList(this.config);
{code} 

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-18 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2584:
-
Attachment: PHOENIX-2584-1.patch

Thanks [~prkommireddi] for the review.  I have an updated patch addressing your 
comments. 
 

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-18 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105432#comment-15105432
 ] 

maghamravikiran commented on PHOENIX-2584:
--

{code}
"toColumnNameMap" creates a Map with name as the key. How is this used? Could 
not figure from the code
{code}

To determine the exact data type of the underlying array of the phoenix object, 
we construct a map of column name and its ColumnInfo and is passed on to 
construct Tuple from the phoenix array.
{code} 
 private static Tuple newTuple(final ColumnInfo cinfo,Object object) throws 
ExecException 
{code}


> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2584-1.patch, PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-17 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103814#comment-15103814
 ] 

maghamravikiran commented on PHOENIX-2584:
--

Thanks [~jmahonin] for the heads up. I replaced the PhoenixPigDBWritable with 
PhoenixRecordWritable in the attached patch.

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Attachments: PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-17 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104017#comment-15104017
 ] 

maghamravikiran commented on PHOENIX-2584:
--

Yes [~prkommireddi]. 

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-17 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2584:
-
Attachment: PHOENIX-2584.patch

First code drop to support Array type in phoenix-pig module.  
[~jamestaylor] ,
Can you please review. 

> Support Array datatype in phoenix-pig module
> 
>
> Key: PHOENIX-2584
> URL: https://issues.apache.org/jira/browse/PHOENIX-2584
> Project: Phoenix
>  Issue Type: Bug
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Attachments: PHOENIX-2584.patch
>
>
> The plan is to map an array data type column to a Tuple in Pig. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()

2016-01-13 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2433:
-
Attachment: PHOENIX-2433-2.patch

> support additional time units (like week/month/year) in Trunc() round() and 
> Ceil() 
> ---
>
> Key: PHOENIX-2433
> URL: https://issues.apache.org/jira/browse/PHOENIX-2433
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: noam bulvik
>Assignee: maghamravikiran
>  Labels: newbie
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2433-2.patch, PHOENIX-2433-firstdrop.patch, 
> PHOENIX-2433.patch
>
>
> currently the time units that are supported in trunk(), round(), ceil are 
> day/hour/minute/seconds/milliseconds. 
> It should support also other values like week, month, year 
> You can see how it is documented for Oracle in  
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and 
> different supported level in 
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()

2016-01-13 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2433:
-
Attachment: (was: PHOENIX-2433-2.patch)

> support additional time units (like week/month/year) in Trunc() round() and 
> Ceil() 
> ---
>
> Key: PHOENIX-2433
> URL: https://issues.apache.org/jira/browse/PHOENIX-2433
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: noam bulvik
>Assignee: maghamravikiran
>  Labels: newbie
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2433-firstdrop.patch, PHOENIX-2433.patch
>
>
> currently the time units that are supported in trunk(), round(), ceil are 
> day/hour/minute/seconds/milliseconds. 
> It should support also other values like week, month, year 
> You can see how it is documented for Oracle in  
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and 
> different supported level in 
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()

2016-01-13 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2433:
-
Attachment: PHOENIX-2433-minor-fix.patch

> support additional time units (like week/month/year) in Trunc() round() and 
> Ceil() 
> ---
>
> Key: PHOENIX-2433
> URL: https://issues.apache.org/jira/browse/PHOENIX-2433
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: noam bulvik
>Assignee: maghamravikiran
>  Labels: newbie
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2433-2.patch, PHOENIX-2433-firstdrop.patch, 
> PHOENIX-2433-minor-fix.patch, PHOENIX-2433.patch
>
>
> currently the time units that are supported in trunk(), round(), ceil are 
> day/hour/minute/seconds/milliseconds. 
> It should support also other values like week, month, year 
> You can see how it is documented for Oracle in  
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and 
> different supported level in 
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()

2016-01-12 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2433:
-
Attachment: PHOENIX-2433-2.patch

[~jamestaylor] 
   I have applied the changes you requested for. Can you please review.

> support additional time units (like week/month/year) in Trunc() round() and 
> Ceil() 
> ---
>
> Key: PHOENIX-2433
> URL: https://issues.apache.org/jira/browse/PHOENIX-2433
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: noam bulvik
>Assignee: maghamravikiran
>  Labels: newbie
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2433-2.patch, PHOENIX-2433-firstdrop.patch, 
> PHOENIX-2433.patch
>
>
> currently the time units that are supported in trunk(), round(), ceil are 
> day/hour/minute/seconds/milliseconds. 
> It should support also other values like week, month, year 
> You can see how it is documented for Oracle in  
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and 
> different supported level in 
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()

2016-01-08 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2433:
-
Attachment: PHOENIX-2433.patch

[~jamestaylor]
   Can you please review it.

> support additional time units (like week/month/year) in Trunc() round() and 
> Ceil() 
> ---
>
> Key: PHOENIX-2433
> URL: https://issues.apache.org/jira/browse/PHOENIX-2433
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: noam bulvik
>Assignee: maghamravikiran
>  Labels: newbie
> Attachments: PHOENIX-2433-firstdrop.patch, PHOENIX-2433.patch
>
>
> currently the time units that are supported in trunk(), round(), ceil are 
> day/hour/minute/seconds/milliseconds. 
> It should support also other values like week, month, year 
> You can see how it is documented for Oracle in  
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and 
> different supported level in 
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2584) Support Array datatype in phoenix-pig module

2016-01-08 Thread maghamravikiran (JIRA)
maghamravikiran created PHOENIX-2584:


 Summary: Support Array datatype in phoenix-pig module
 Key: PHOENIX-2584
 URL: https://issues.apache.org/jira/browse/PHOENIX-2584
 Project: Phoenix
  Issue Type: Bug
Reporter: maghamravikiran
Assignee: maghamravikiran


The plan is to map an array data type column to a Tuple in Pig. 

  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()

2016-01-05 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2433:
-
Attachment: PHOENIX-2433-firstdrop.patch

[~James Taylor] ,
  Two approaches come to my mind.
   1.  Joda Time: I have patch that uses this library providing us with the 
necessary functionality . Its simple and just works like a charm.  

   2.  Write the logic ourselves.  I initially started off with this approach 
and to find out the result for FLOOR('date','WEEK') , the following code needs 
to be applied .   Definitely this will change with CEIL and ROUND . 
{code}
Date dateUpserted = new Date();
long divBy = 24 * 60 * 60 * 1000;
long millis = dateUpserted.getTime();
millis = millis + 3 * divBy;
millis = millis / (7 * divBy);
millis = millis * 7 * divBy;
millis = millis - (3 * divBy);
Date flooredDate = new Date(millis);
{code}
Is it ok with the first approach or should we stick with the second ?  I prefer 
the former as its simpler and handles all corner cases well.  

All the tests in RoundFloorCeilFunctionsEnd2EndIT pass with the patch attached.


> support additional time units (like week/month/year) in Trunc() round() and 
> Ceil() 
> ---
>
> Key: PHOENIX-2433
> URL: https://issues.apache.org/jira/browse/PHOENIX-2433
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: noam bulvik
>Assignee: maghamravikiran
>  Labels: newbie
> Attachments: PHOENIX-2433-firstdrop.patch
>
>
> currently the time units that are supported in trunk(), round(), ceil are 
> day/hour/minute/seconds/milliseconds. 
> It should support also other values like week, month, year 
> You can see how it is documented for Oracle in  
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and 
> different supported level in 
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-2433) support additional time units (like week/month/year) in Trunc() round() and Ceil()

2016-01-03 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran reassigned PHOENIX-2433:


Assignee: maghamravikiran

> support additional time units (like week/month/year) in Trunc() round() and 
> Ceil() 
> ---
>
> Key: PHOENIX-2433
> URL: https://issues.apache.org/jira/browse/PHOENIX-2433
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: noam bulvik
>Assignee: maghamravikiran
>  Labels: newbie
>
> currently the time units that are supported in trunk(), round(), ceil are 
> day/hour/minute/seconds/milliseconds. 
> It should support also other values like week, month, year 
> You can see how it is documented for Oracle in  
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions201.htm and 
> different supported level in 
> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions230.htm#i1002084



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails

2015-12-29 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074241#comment-15074241
 ] 

maghamravikiran commented on PHOENIX-2538:
--

[~gabriel.reid] , [~jamestaylor] 
I have applied the patch to 4.x.-HBase-0.98 , 4.x.-HBase-1.0 and master 
branches only.   Please let me know if this should be applied to any other 
branches that I am missing. 

> CsvBulkLoadTool should return non-zero exit status if import fails
> --
>
> Key: PHOENIX-2538
> URL: https://issues.apache.org/jira/browse/PHOENIX-2538
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Gabriel Reid
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2538-1.patch, PHOENIX-2538.patch
>
>
> The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it 
> does not correctly return a non-zero error code if the import job fails. This 
> makes it impossible for users of the tool to automatically determine if the 
> tool failed (e.g. when running it from shell scripts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2540) Same column twice in CREATE TABLE leads unusable state of the table

2015-12-21 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067016#comment-15067016
 ] 

maghamravikiran commented on PHOENIX-2540:
--

[~warwithin]  A quick test against the 4.6.0 v confirms  that  duplicate 
columns aren't allowed in the DDL and an exception is thrown.

{code}
@Test
public void testDuplicateColumnNames() throws Exception {
String ddl = "create table IF NOT EXISTS TEST_DUP_COLUMNS ("
+ " id char(1) NOT NULL,"
+ " col1 integer NOT NULL,"
+ " col2 integer,"
+ " col2 integer, "
+ " CONSTRAINT NAME_PK PRIMARY KEY (id, col1)"
+ " )";
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute(ddl);
fail(" Duplicate column col2 exists in the ddl");
} catch(SQLException sqle) {

assertEquals(SQLExceptionCode.COLUMN_EXIST_IN_DEF.getErrorCode(),sqle.getErrorCode());
}
}

{code}



> Same column twice in CREATE TABLE leads unusable state of the table
> ---
>
> Key: PHOENIX-2540
> URL: https://issues.apache.org/jira/browse/PHOENIX-2540
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.6.0
> Environment: Phoenix 4.6 and current master branch / HBase 1.1.2
>Reporter: YoungWoo Kim
> Fix For: 4.7.0
>
>
> If users define same column twice in a table, the table would be unusable. 
> when I try to drop the table, I got ArrayIndexOutOfBoundsException as 
> following. To prevent this, CREATE TABLE should check duplicated columns. 
> E.g.,
> CREATE TABLE tbl (a varchar not null primary key, b bigint, b bigint, c date);
> This DDL works without an error but It should be failed because column 'b' is 
> defined twice.
> {noformat}
> 2015-12-18 12:11:52,171 ERROR 
> [B.defaultRpcServer.handler=46,queue=4,port=16020] 
> coprocessor.MetaDataEndpointImpl: dropTable failed
> java.lang.ArrayIndexOutOfBoundsException: 20
>   at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:380)
>   at org.apache.phoenix.schema.PTableImpl.(PTableImpl.java:301)
>   at org.apache.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:290)
>   at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:844)
>   at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:472)
>   at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doDropTable(MetaDataEndpointImpl.java:1450)
>   at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.dropTable(MetaDataEndpointImpl.java:1403)
>   at 
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11629)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails

2015-12-20 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2538:
-
Attachment: PHOENIX-2538-1.patch

> CsvBulkLoadTool should return non-zero exit status if import fails
> --
>
> Key: PHOENIX-2538
> URL: https://issues.apache.org/jira/browse/PHOENIX-2538
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Gabriel Reid
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2538-1.patch, PHOENIX-2538.patch
>
>
> The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it 
> does not correctly return a non-zero error code if the import job fails. This 
> makes it impossible for users of the tool to automatically determine if the 
> tool failed (e.g. when running it from shell scripts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails

2015-12-20 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065915#comment-15065915
 ] 

maghamravikiran commented on PHOENIX-2538:
--

[~gabriel.reid] 
Thanks for the review.  I have reverted my change on removing the status of 
delete in the new patch attached. 
I did a cross check with an old version of  CsvBulkLoadTool.java and the 
import order matches with the one i have .  :)

[1] 
https://github.com/apache/phoenix/blob/4.3/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvBulkLoadTool.java

> CsvBulkLoadTool should return non-zero exit status if import fails
> --
>
> Key: PHOENIX-2538
> URL: https://issues.apache.org/jira/browse/PHOENIX-2538
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Gabriel Reid
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2538-1.patch, PHOENIX-2538.patch
>
>
> The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it 
> does not correctly return a non-zero error code if the import job fails. This 
> makes it impossible for users of the tool to automatically determine if the 
> tool failed (e.g. when running it from shell scripts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails

2015-12-19 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran reassigned PHOENIX-2538:


Assignee: maghamravikiran

> CsvBulkLoadTool should return non-zero exit status if import fails
> --
>
> Key: PHOENIX-2538
> URL: https://issues.apache.org/jira/browse/PHOENIX-2538
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Gabriel Reid
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
>
> The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it 
> does not correctly return a non-zero error code if the import job fails. This 
> makes it impossible for users of the tool to automatically determine if the 
> tool failed (e.g. when running it from shell scripts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails

2015-12-19 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2538:
-
Attachment: PHOENIX-2538.patch

> CsvBulkLoadTool should return non-zero exit status if import fails
> --
>
> Key: PHOENIX-2538
> URL: https://issues.apache.org/jira/browse/PHOENIX-2538
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Gabriel Reid
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2538.patch
>
>
> The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it 
> does not correctly return a non-zero error code if the import job fails. This 
> makes it impossible for users of the tool to automatically determine if the 
> tool failed (e.g. when running it from shell scripts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2538) CsvBulkLoadTool should return non-zero exit status if import fails

2015-12-19 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065619#comment-15065619
 ] 

maghamravikiran commented on PHOENIX-2538:
--

[~gabriel.reid] , [~jamestaylor] Can you please review the patch.

> CsvBulkLoadTool should return non-zero exit status if import fails
> --
>
> Key: PHOENIX-2538
> URL: https://issues.apache.org/jira/browse/PHOENIX-2538
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Gabriel Reid
>Assignee: maghamravikiran
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2538.patch
>
>
> The changes in PHOENIX-2216 accidentally changed CsvBulkLoadTool so that it 
> does not correctly return a non-zero error code if the import job fails. This 
> makes it impossible for users of the tool to automatically determine if the 
> tool failed (e.g. when running it from shell scripts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2367) Change PhoenixRecordWriter to use execute instead of executeBatch

2015-12-15 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059258#comment-15059258
 ] 

maghamravikiran commented on PHOENIX-2367:
--

 The changes look good. For better visibility, it would be great to separate 
the changes for ReserveNSequence.java from the patch. 

> Change PhoenixRecordWriter to use execute instead of executeBatch
> -
>
> Key: PHOENIX-2367
> URL: https://issues.apache.org/jira/browse/PHOENIX-2367
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2367.patch
>
>
> Hey All,
> I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage 
> similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I 
> did some work on the apache pig code that allows Storers to have a notion of 
> Customizable/Configurable Errors PIG-4704.
> I wanted to plug this behavior for PhoenixHbaseStorage and propose certain 
> changes for the same.
> Current Behavior/Problem:
> PhoenixRecordWriter makes use of executeBatch() to process rows once batch 
> size is reached. If there are any client side validation/syntactical errors 
> like data not fitting the column size, executeBatch() throws an exception and 
> there is no-way to retrieve the valid rows from the batch and retry them. We 
> discard the whole batch or fail the job without errorhandling.
> With auto commit set to false execute() also servers the purpose of not 
> making any rpc calls  but does a bunch of validation client side and adds it 
> to the client cache of mutation.
> On conn.commit() we make a rpc call.
> Proposed Change
> To be able to use Configurable ErrorHandling and ignore only the failed 
> records instead of discarding the whole batch I want to propose changing the 
> behavior in PhoenixRecordWriter from execute to executeBatch() or having a 
> configuration to toggle between the 2 behaviors 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2429) PhoenixConfigurationUtil.CURRENT_SCN_VALUE for phoenix-spark plugin does not work

2015-11-18 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011667#comment-15011667
 ] 

maghamravikiran commented on PHOENIX-2429:
--

Apparently, the property "phoenix.mr.currentscn.value" is a key that users can 
set before the job starts. We are currently using it in [1].  I agree , we 
could rather use 'CurrentSCN'  . Curious, would there be any issues if users 
wanted to use the CurrentSCN only for the InputConnection during SELECT query 
but not for the OutputConnection for  an UPSERT query.  By setting this value 
in Configuration, all Connection objects will respect the SCN property.  


[1] 
https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/PhoenixIndexImportMapper.java#L75

> PhoenixConfigurationUtil.CURRENT_SCN_VALUE for phoenix-spark plugin does not 
> work
> -
>
> Key: PHOENIX-2429
> URL: https://issues.apache.org/jira/browse/PHOENIX-2429
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.2.0, 4.6.0
>Reporter: Diego Fustes Villadóniga
>
> When I call the method saveToPhoenix to store the contents of a ProductDD, 
> passing a hadoop configuration, where I set 
> PhoenixConfigurationUtil.CURRENT_SCN_VALUE to be a given timestamp, the 
> values are not stored with such timestamp, but using the server time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2427) Phoenix-Pig tests fails due to timeout

2015-11-18 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012059#comment-15012059
 ] 

maghamravikiran commented on PHOENIX-2427:
--

Sure [~mujtabachohan] Will look into now.

> Phoenix-Pig tests fails due to timeout 
> ---
>
> Key: PHOENIX-2427
> URL: https://issues.apache.org/jira/browse/PHOENIX-2427
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Mujtaba Chohan
>Assignee: Mujtaba Chohan
>Priority: Minor
>
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-failsafe-plugin:2.19:verify 
> (ClientManagedTimeTests) on project phoenix-pig: There was a timeout or other 
> error in the fork -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-failsafe-plugin:2.19:verify 
> (ClientManagedTimeTests) on project phoenix-pig: There was a timeout or other 
> error in the fork
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
>   at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
>   at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
>   at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
>   at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
>   at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2373) Change ReserveNSequence Udf to take in zookeeper and tentantId as param

2015-11-07 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995404#comment-14995404
 ] 

maghamravikiran commented on PHOENIX-2373:
--

Yes. Thanks [~siddhimehta] ! 

> Change ReserveNSequence Udf to take in zookeeper and tentantId as param
> ---
>
> Key: PHOENIX-2373
> URL: https://issues.apache.org/jira/browse/PHOENIX-2373
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
>Priority: Minor
> Attachments: PHOENIX-2373.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently the UDF reads zookeeper quorum for tuple value and tenantId is 
> passed in from the jobConf.
> Instead wanted to make a change for the UDF to take both zookeeper quorum and 
> tenantId as params passed to the UDF explicitly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-763) Support for Sqoop

2015-11-02 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985504#comment-14985504
 ] 

maghamravikiran commented on PHOENIX-763:
-

Support for integrating Sqoop and Phoenix can be tracked through 
https://issues.apache.org/jira/browse/SQOOP-2649.  Currently, the patch is 
available for 1.4.6 v of Sqoop.  

Ex.
1.  sqoop import --connect jdbc:mysql://localhost/test --username root -P 
--verbose --table employee --phoenix-table EMP

2. sqoop import --connect jdbc:mysql://localhost/test --username root -P 
--verbose --query "SELECT id AS ID,name AS NAME FROM employee WHERE 
\$CONDITIONS" --target-dir /tmp/employee --phoenix-table EMP

3. sqoop import --connect jdbc:mysql://localhost/test --username root -P 
--verbose --query "SELECT rowid,name FROM employee WHERE \$CONDITIONS" 
--target-dir /tmp/employee --phoenix-table EMP --phoenix-column-mapping 
"rowid;ID,name;NAME" 

4. sqoop import --connect jdbc:mysql://localhost/test --username root -P 
--verbose --query "SELECT rowid,name FROM employee WHERE \$CONDITIONS" 
--target-dir /tmp/employee --phoenix-table EMP --phoenix-column-mapping 
"rowid;ID,name;NAME" --phoenix-bulkload

Arguments:
--phoenix-table : Required . The phoenix table 
--phoenix-column-mapping:   Optional. This property should be specified if 
the column names between sqoop table and phoenix table differ. 
--phoenix-bulkload Optional . Bulk loads data onto the 
phoenix table.  



> Support for Sqoop
> -
>
> Key: PHOENIX-763
> URL: https://issues.apache.org/jira/browse/PHOENIX-763
> Project: Phoenix
>  Issue Type: Task
>Affects Versions: 3.0.0
>Reporter: James Taylor
>Assignee: mravi
>  Labels: patch
> Attachments: PHOENIX-763.patch
>
>
> Not sure anything required from our end, but you should be able to use Sqoop 
> to create and populate Phoenix tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-21 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-2216.
--
   Resolution: Fixed
Fix Version/s: 4.6.0

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Fix For: 4.6.0
>
> Attachments: mhfileoutput-final.patch, 
> phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, 
> phoenix-tests-split-on.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-17 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: mhfileoutput-final.patch

This patch is a minor rework to the earlier ones with the addition to avoid 
running CSV Bulk load for local indexes.  PHOENIX-2334 will track the fix for 
local index.

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: mhfileoutput-final.patch, 
> phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, 
> phoenix-tests-split-on.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-17 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962167#comment-14962167
 ] 

maghamravikiran edited comment on PHOENIX-2216 at 10/18/15 3:55 AM:


This patch is a minor rework to the earlier ones with the addition to avoid 
running CSV Bulk load for local indexes.  PHOENIX-2334 will track the fix for 
local index.

[~jamestaylor] [~gabriel.reid] 
If possible, can you please take a look.


was (Author: maghamraviki...@gmail.com):
This patch is a minor rework to the earlier ones with the addition to avoid 
running CSV Bulk load for local indexes.  PHOENIX-2334 will track the fix for 
local index.

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: mhfileoutput-final.patch, 
> phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, 
> phoenix-tests-split-on.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2334) CSV Bulk load fails on local indexes

2015-10-17 Thread maghamravikiran (JIRA)
maghamravikiran created PHOENIX-2334:


 Summary: CSV Bulk load fails on local indexes
 Key: PHOENIX-2334
 URL: https://issues.apache.org/jira/browse/PHOENIX-2334
 Project: Phoenix
  Issue Type: Bug
Reporter: maghamravikiran
Assignee: Rajeshbabu Chintaguntla


CSV Bulk load fails on local indexes. A quick test for this is 

{code}
@Test
public void testImportWithLocalIndex() throws Exception {

Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE TABLE6 (ID INTEGER NOT NULL PRIMARY KEY, " +
"FIRST_NAME VARCHAR, LAST_NAME VARCHAR) SPLIt ON (1,2)");
String ddl = "CREATE LOCAL INDEX TABLE6_IDX ON TABLE6 "
+ " (FIRST_NAME ASC)";
stmt.execute(ddl);

FileSystem fs = FileSystem.get(hbaseTestUtil.getConfiguration());
FSDataOutputStream outputStream = fs.create(new 
Path("/tmp/input3.csv"));
PrintWriter printWriter = new PrintWriter(outputStream);
printWriter.println("1,FirstName 1,LastName 1");
printWriter.println("2,FirstName 2,LastName 2");
printWriter.close();

CsvBulkLoadTool csvBulkLoadTool = new CsvBulkLoadTool();
csvBulkLoadTool.setConf(hbaseTestUtil.getConfiguration());
int exitCode = csvBulkLoadTool.run(new String[] {
"--input", "/tmp/input3.csv",
"--table", "table6",
"--zookeeper", zkQuorum});
assertEquals(0, exitCode);

ResultSet rs = stmt.executeQuery("SELECT id, FIRST_NAME FROM TABLE6 
where first_name='FirstName 2'");
assertTrue(rs.next());
assertEquals(2, rs.getInt(1));
assertEquals("FirstName 2", rs.getString(2));

rs.close();
stmt.close();
}

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-16 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: phoenix-tests-split-on.patch

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, 
> phoenix-tests-split-on.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-16 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961612#comment-14961612
 ] 

maghamravikiran commented on PHOENIX-2216:
--

The tests which involve local indexes fail when the pre-split option is 
specified . I have attached the test case .  

[~jamestaylor]
Currently, I have used a custom Writable class (CsvTableRowkeyPair) . To 
get it onto HBase, I feel we should stick ImmutableByteWritable as the Reducer 
output key.  Also, we would need the delimiter for the table name and rowkey to 
be passed on as a configuration parameter.  This way, we can write the parsing 
the of reducer output key for the table and rowkey and construct the necessary 
output path.  Let me know if this sounds reasonable.  

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-16 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961648#comment-14961648
 ] 

maghamravikiran edited comment on PHOENIX-2216 at 10/17/15 2:00 AM:


Forgot to mention. The tests with local indexes fail on the master branch also. 


was (Author: maghamraviki...@gmail.com):
Forgot to mention. The tests with local indexes fail on the master branch also. 

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, 
> phoenix-tests-split-on.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-16 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961648#comment-14961648
 ] 

maghamravikiran commented on PHOENIX-2216:
--

Forgot to mention. The tests with local indexes fail on the master branch also. 

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, 
> phoenix-tests-split-on.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-16 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961648#comment-14961648
 ] 

maghamravikiran edited comment on PHOENIX-2216 at 10/17/15 2:00 AM:


Forgot to mention. The tests for local indexes fail on the master branch when I 
add SPLIT ON . 


was (Author: maghamraviki...@gmail.com):
Forgot to mention. The tests with local indexes fail on the master branch also. 

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch, 
> phoenix-tests-split-on.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-14 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957259#comment-14957259
 ] 

maghamravikiran commented on PHOENIX-2216:
--

[~gabriel.reid] I have attached a new patch with the comments explicitly 
specifying where the code was changed in the RecordWriter of the custom  
MultiHfileOutputFormat.  You will see comments like phoenix-2216: start  and 
phoenix-2216: end . 

Regarding the phoenix-multipleoutputs.patch, I believe I uploaded a wrong 
patch.  The tests testBasicImport and testFullOptionImport should work. I will 
send across the patch soon. Sorry on that.


> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-14 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: phoenix-custom-hfileoutputformat-comments.patch

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-14 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: (was: phoenix-multipleoutputs.patch)

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-14 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: phoenix-multipleoutputs.patch

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-13 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: phoenix-custom-hfileoutputformat.patch

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat.patch, 
> phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-13 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: phoenix-multipleoutputs.patch

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat.patch, 
> phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-13 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: (was: 2216-wip.patch)

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat.patch, 
> phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-13 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954608#comment-14954608
 ] 

maghamravikiran commented on PHOENIX-2216:
--

[~gabriel.reid], [~jamestaylor]
I have attached two patch files using different approaches.
a) HFileMultioutputFormat: [phoenix-custom-hfileoutputformat.patch]
Most of the code is copied over from HFileOutputformat with minor 
tweaks to write the data to different directories based on table and family 
name. All the integration tests work successfully.  :) 

b) MultipleOutputs: [phoenix-multipleoutputs.patch]
  The plan is to use MultipleOutputs with HFileOutputFormat2 as the 
OutputFormat from the Reducer . Tests which involve a single table bulk load 
works but when we have multiple tables, tests keep failing. If the 
HFileOutputFormat produces the necessary files under the configured job 
outputpath, it works.  However, for bulk loads of multiple tables , tests fail. 
 

Please let me know which of the two approaches should we follow. 

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: phoenix-custom-hfileoutputformat.patch, 
> phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-05 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-2216:
-
Attachment: 2216-wip.patch

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: 2216-wip.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-10-05 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944135#comment-14944135
 ] 

maghamravikiran commented on PHOENIX-2216:
--

[~jamestaylor] ,[~gabriel.reid]
I have  a patch attached which shows the current state of my work. I am yet 
validate if it works correctly.  


> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-1999) Phoenix Pig Loader does not return data when selecting from multiple tables in a query with a join

2015-10-05 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-1999.
--
Resolution: Invalid

> Phoenix Pig Loader does not return data when selecting from multiple tables 
> in a query with a join
> --
>
> Key: PHOENIX-1999
> URL: https://issues.apache.org/jira/browse/PHOENIX-1999
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.1.0
> Environment: Pig 0.14.3, Hadoop 2.5.2
>Reporter: Seth Brogan
>Assignee: maghamravikiran
>
> The Phoenix Pig Loader does not return data in Pig when selecting specific 
> columns from multiple tables in a join query.
> Example:
> {code}
> DESCRIBE my_table;
> my_table: {a: chararray, my_id: chararray}
> DUMP my_table;
> (abc, 123)
> DESCRIBE join_table;
> join_table: {x: chararray, my_id: chararray}
> DUMP join_table;
> (xyz, 123)
> A = LOAD 'hbase://query/SELECT "t1"."a", "t2"."x" FROM "my_table" AS "t1" 
> JOIN "join_table" AS "t2" ON "t1"."my_id" = "t2"."my_id"' using 
> org.apache.phoenix.pig.PhoenixHBaseLoader('localhost');
> DUMP A;
> (,)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PHOENIX-1999) Phoenix Pig Loader does not return data when selecting from multiple tables in a query with a join

2015-10-05 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran closed PHOENIX-1999.


> Phoenix Pig Loader does not return data when selecting from multiple tables 
> in a query with a join
> --
>
> Key: PHOENIX-1999
> URL: https://issues.apache.org/jira/browse/PHOENIX-1999
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.1.0
> Environment: Pig 0.14.3, Hadoop 2.5.2
>Reporter: Seth Brogan
>Assignee: maghamravikiran
>
> The Phoenix Pig Loader does not return data in Pig when selecting specific 
> columns from multiple tables in a join query.
> Example:
> {code}
> DESCRIBE my_table;
> my_table: {a: chararray, my_id: chararray}
> DUMP my_table;
> (abc, 123)
> DESCRIBE join_table;
> join_table: {x: chararray, my_id: chararray}
> DUMP join_table;
> (xyz, 123)
> A = LOAD 'hbase://query/SELECT "t1"."a", "t2"."x" FROM "my_table" AS "t1" 
> JOIN "join_table" AS "t2" ON "t1"."my_id" = "t2"."my_id"' using 
> org.apache.phoenix.pig.PhoenixHBaseLoader('localhost');
> DUMP A;
> (,)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-1031) Compile query only once for Pig loader

2015-10-04 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-1031.
--
Resolution: Won't Fix

> Compile query only once for Pig loader
> --
>
> Key: PHOENIX-1031
> URL: https://issues.apache.org/jira/browse/PHOENIX-1031
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
>Priority: Minor
>
> I noticed that the query is compiled a few times in the Pig loader. We 
> should, if possible, compile it once and hold on to the QueryPlan instead of 
> compiling it multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PHOENIX-1031) Compile query only once for Pig loader

2015-10-04 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran closed PHOENIX-1031.


> Compile query only once for Pig loader
> --
>
> Key: PHOENIX-1031
> URL: https://issues.apache.org/jira/browse/PHOENIX-1031
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
>Priority: Minor
>
> I noticed that the query is compiled a few times in the Pig loader. We 
> should, if possible, compile it once and hold on to the QueryPlan instead of 
> compiling it multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2036) PhoenixConfigurationUtil should provide a pre-normalize table name to PhoenixRuntime

2015-10-04 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-2036.
--
Resolution: Fixed

Folks, I would like to close this ticket as the patches for it has already been 
pushed to master and 4.4. and 4.x branches.   

[~danmeany]   kindly open a new issue if you are seeing issues in spark 
dataframes . 

> PhoenixConfigurationUtil should provide a pre-normalize table name to 
> PhoenixRuntime
> 
>
> Key: PHOENIX-2036
> URL: https://issues.apache.org/jira/browse/PHOENIX-2036
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Siddhi Mehta
>Assignee: maghamravikiran
>Priority: Minor
> Attachments: PHOENIX-2036-spark-v2.patch, PHOENIX-2036-spark.patch, 
> PHOENIX-2036-v1.patch, PHOENIX-2036-v1.patch, PHOENIX-2036-v2.patch, 
> PHOENIX-2036.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I was trying a basic store using PhoenixHBaseStorage and ran into some issues 
> with it complaining about TableNotFoundException.
> The table(CUSTOM_ENTITY."z02") in question exists.
> Looking at the stacktrace I think its likely related to the change in 
> PHOENIX-1682 where phoenix runtime expects a pre-normalized table name.
> We need to update 
> PhoenixConfigurationUtil.getSelectColumnMetadataList(Configuration) be pass a 
> pre-normalized table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-1464) IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when accessed from Pig

2015-10-04 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-1464.
--
Resolution: Invalid

A simple test was written to prove that PigLoader is able to pull 
UNSIGNED_FLOAT type column. Hence closing this.

> IllegalDataException  is thrown for an UNSIGNED_FLOAT column of phoenix when 
> accessed from Pig
> --
>
> Key: PHOENIX-1464
> URL: https://issues.apache.org/jira/browse/PHOENIX-1464
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: maghamravikiran
>Assignee: maghamravikiran
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1464) IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when accessed from Pig

2015-10-04 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran updated PHOENIX-1464:
-
Attachment: PHOENIX-1464-test-case.patch

> IllegalDataException  is thrown for an UNSIGNED_FLOAT column of phoenix when 
> accessed from Pig
> --
>
> Key: PHOENIX-1464
> URL: https://issues.apache.org/jira/browse/PHOENIX-1464
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Attachments: PHOENIX-1464-test-case.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (PHOENIX-1464) IllegalDataException is thrown for an UNSIGNED_FLOAT column of phoenix when accessed from Pig

2015-10-04 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran closed PHOENIX-1464.


> IllegalDataException  is thrown for an UNSIGNED_FLOAT column of phoenix when 
> accessed from Pig
> --
>
> Key: PHOENIX-1464
> URL: https://issues.apache.org/jira/browse/PHOENIX-1464
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: maghamravikiran
>Assignee: maghamravikiran
> Attachments: PHOENIX-1464-test-case.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2298) Problem storing with pig on a salted table

2015-10-01 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939840#comment-14939840
 ] 

maghamravikiran commented on PHOENIX-2298:
--

This issue was fixed in PHOENIX-2181. Can you share the Phoenix version you are 
using. 

> Problem storing with pig on a salted table
> --
>
> Key: PHOENIX-2298
> URL: https://issues.apache.org/jira/browse/PHOENIX-2298
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Guillaume salou
>
> When I try to upsert via pigStorage on a salted table I get this error.
> Store ... using org.apache.phoenix.pig.PhoenixHBaseStorage();
> first field of the table :
> CurrentTime() asINTERNALTS:datetime,
> This date is not used in the primary key of the table.
> Works perfectly on a non salted table.
> Caused by: java.lang.RuntimeException: Unable to process column _SALT:BINARY, 
> innerMessage=org.apache.phoenix.schema.TypeMismatchException: ERROR 203 
> (22005): Type mismatch. BINARY cannot be coerced to DATE
>   at 
> org.apache.phoenix.pig.writable.PhoenixPigDBWritable.write(PhoenixPigDBWritable.java:66)
>   at 
> org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:78)
>   at 
> org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:39)
>   at 
> org.apache.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:182)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.phoenix.schema.ConstraintViolationException: 
> org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type 
> mismatch. BINARY cannot be coerced to DATE
>   at 
> org.apache.phoenix.schema.types.PDataType.throwConstraintViolationException(PDataType.java:282)
>   at org.apache.phoenix.schema.types.PDate.toObject(PDate.java:77)
>   at 
> org.apache.phoenix.pig.util.TypeUtil.castPigTypeToPhoenix(TypeUtil.java:208)
>   at 
> org.apache.phoenix.pig.writable.PhoenixPigDBWritable.convertTypeSpecificValue(PhoenixPigDBWritable.java:79)
>   at 
> org.apache.phoenix.pig.writable.PhoenixPigDBWritable.write(PhoenixPigDBWritable.java:59)
>   ... 21 more
> Caused by: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 
> (22005): Type mismatch. BINARY cannot be coerced to DATE
>   at 
> org.apache.phoenix.exception.SQLExceptionCode$1.newException(SQLExceptionCode.java:68)
>   at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:133)
>   ... 26 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2287) Spark Plugin Exception - java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row

2015-09-26 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909415#comment-14909415
 ] 

maghamravikiran commented on PHOENIX-2287:
--

[~jmahonin] 
 The patch looks good.  
For the DecimalType, would it be ideal to explicitly specify the precision 
and scale to the defaults as that will ensure the phoenix-spark module works 
with prior versions of spark. I notice SYSTEM_DEFAULT has been added in 1.5.0 v 
only.  

> Spark Plugin Exception - java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to 
> org.apache.spark.sql.Row
> -
>
> Key: PHOENIX-2287
> URL: https://issues.apache.org/jira/browse/PHOENIX-2287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
> Environment: - HBase 1.1.1 running in standalone mode on OS X
> - Spark 1.5.0
> - Phoenix 4.5.2
>Reporter: Babar Tareen
>Assignee: Josh Mahonin
> Attachments: PHOENIX-2287.patch
>
>
> Running the DataFrame example on Spark Plugin page 
> (https://phoenix.apache.org/phoenix_spark.html) results in following 
> exception. The same code works as expected with Spark 1.4.1.
> {code:java}
> import org.apache.spark.SparkContext
> import org.apache.spark.sql.SQLContext
> import org.apache.phoenix.spark._
> val sc = new SparkContext("local", "phoenix-test")
> val sqlContext = new SQLContext(sc)
> val df = sqlContext.load(
>   "org.apache.phoenix.spark",
>   Map("table" -> "TABLE1", "zkUrl" -> "127.0.0.1:2181")
> )
> df
>   .filter(df("COL1") === "test_row_1" && df("ID") === 1L)
>   .select(df("ID"))
>   .show
> {code}
> Exception
> {quote}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to 
> org.apache.spark.sql.Row
> at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:439) 
> ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) 
> ~[scala-library-2.11.4.jar:na]
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) 
> ~[scala-library-2.11.4.jar:na]
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) 
> ~[scala-library-2.11.4.jar:na]
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)
>  ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
>  ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
>  ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
>  ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
>  ~[spark-sql_2.11-1.5.0.jar:1.5.0]
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
>  ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.scheduler.Task.run(Task.scala:88) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
> ~[spark-core_2.11-1.5.0.jar:1.5.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2196) phoenix-spark should automatically convert DataFrame field names to all caps

2015-09-25 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908308#comment-14908308
 ] 

maghamravikiran commented on PHOENIX-2196:
--

[~jmahonin] 
   The patch looks good.  +1. 

> phoenix-spark should automatically convert DataFrame field names to all caps
> 
>
> Key: PHOENIX-2196
> URL: https://issues.apache.org/jira/browse/PHOENIX-2196
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Randy Gelhausen
>Assignee: Josh Mahonin
>Priority: Minor
> Attachments: PHOENIX-2196-v2.patch, PHOENIX-2196.patch
>
>
> phoenix-spark will fail to save a DF into a Phoenix table if the DataFrame's 
> fields are not all capitalized. Since Phoenix internally capitalizes all 
> column names, the DataFrame.save method should automatically capitalize DF 
> field names as a convenience to the end user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

2015-09-15 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran reassigned PHOENIX-2216:


Assignee: maghamravikiran

> Support single mapper pass to CSV bulk load table and indexes
> -
>
> Key: PHOENIX-2216
> URL: https://issues.apache.org/jira/browse/PHOENIX-2216
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2231) Support CREATE/DROP SEQUENCE in Phoenix/Calcite Integration

2015-09-08 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736221#comment-14736221
 ] 

maghamravikiran commented on PHOENIX-2231:
--

[~maryannxue] Sure.  

> Support CREATE/DROP SEQUENCE in Phoenix/Calcite Integration
> ---
>
> Key: PHOENIX-2231
> URL: https://issues.apache.org/jira/browse/PHOENIX-2231
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Maryann Xue
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2200) Can phoenix support mapreduce with secure hbase(kerberos)?

2015-08-27 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717941#comment-14717941
 ] 

maghamravikiran commented on PHOENIX-2200:
--

[~scootli] Can you try adding this one statement before submitting the job and 
see if it helps.
{code}
   TableMapReduceUtil.initCredentials(job) 
{code}

 Can phoenix support mapreduce with secure hbase(kerberos)? 
 ---

 Key: PHOENIX-2200
 URL: https://issues.apache.org/jira/browse/PHOENIX-2200
 Project: Phoenix
  Issue Type: Bug
 Environment: hbase-0.98.12.1-hadoop2phoenix-4.5.0-HBase-0.98-bin
Reporter: lihuaqing

 I can not work with phoenix mapreduce kerberos. My codes is as followings:
 final Configuration configuration = HBaseConfiguration.create();
 configuration.set(hbase.security.authentication,kerberos);
 configuration.set(hadoop.security.authentication, kerberos);
 configuration.set(hbase.master.kerberos.principal,hbase/_HOST@DATA.SCLOUD);
 configuration.set(hbase.regionserver.kerberos.principal,hbase/_HOST@DATA.SCLOUD);
 configuration.set(QueryServices.HBASE_CLIENT_PRINCIPAL,***);
 configuration.set(QueryServices.HBASE_CLIENT_KEYTAB,***);
 final Job job = Job.getInstance(configuration, phoenix-mr-job);
 // We can either specify a selectQuery or ignore it when we would like to 
 retrieve all the columns
 final String selectQuery = SELECT 
 STOCK_NAME,RECORDING_YEAR,RECORDINGS_QUARTER FROM STOCK ;
 // StockWritable is the DBWritable class that enables us to process the 
 Result of the above query
 PhoenixMapReduceUtil.setInput(job, StockWritable.class, STOCK,  
 selectQuery);  
 // Set the target Phoenix table and the columns
 PhoenixMapReduceUtil.setOutput(job, STOCK_STATS, 
 STOCK_NAME,MAX_RECORDING);
 job.setMapperClass(StockMapper.class);
 job.setReducerClass(StockReducer.class); 
 job.setOutputFormatClass(PhoenixOutputFormat.class);
 job.setMapOutputKeyClass(Text.class);
 job.setMapOutputValueClass(DoubleWritable.class);
 job.setOutputKeyClass(NullWritable.class);
 job.setOutputValueClass(StockWritable.class); 
 TableMapReduceUtil.addDependencyJars(job);
 job.waitForCompletion(true);
 I get the error statck as following:
 2015-08-24 12:12:15,767 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.RuntimeException: java.sql.SQLException: 
 ERROR 103 (08004): Unable to establish connection.
   at 
 org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:125)
   at 
 org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:69)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.init(MapTask.java:512)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to establish 
 connection.
   at 
 org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:388)
   at 
 org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
   at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:297)
   at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:180)
   at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1901)
   at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1880)
   at 
 org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
   at 
 org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1880)
   at 
 org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
   at 
 org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
   at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
   at java.sql.DriverManager.getConnection(DriverManager.java:579)
   at java.sql.DriverManager.getConnection(DriverManager.java:190)
   at 
 org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:93)
   at 
 org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
   at 
 

[jira] [Commented] (PHOENIX-2196) phoenix-spark should automatically convert DataFrame field names to all caps

2015-08-25 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711444#comment-14711444
 ] 

maghamravikiran commented on PHOENIX-2196:
--

Thanks [~jmahonin]  [~rgelhau] for the work. +1 for the changes.

 phoenix-spark should automatically convert DataFrame field names to all caps
 

 Key: PHOENIX-2196
 URL: https://issues.apache.org/jira/browse/PHOENIX-2196
 Project: Phoenix
  Issue Type: Improvement
Reporter: Randy Gelhausen
Assignee: Josh Mahonin
Priority: Minor
 Attachments: PHOENIX-2196.patch


 phoenix-spark will fail to save a DF into a Phoenix table if the DataFrame's 
 fields are not all capitalized. Since Phoenix internally capitalizes all 
 column names, the DataFrame.save method should automatically capitalize DF 
 field names as a convenience to the end user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2116) phoenix-flume: Sink/Serializer should be extendable

2015-08-25 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711440#comment-14711440
 ] 

maghamravikiran commented on PHOENIX-2116:
--

+1 to the patch. 

 phoenix-flume: Sink/Serializer should be extendable
 ---

 Key: PHOENIX-2116
 URL: https://issues.apache.org/jira/browse/PHOENIX-2116
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.5.0, 4.4.1
Reporter: Josh Mahonin
Assignee: Josh Mahonin
 Attachments: PHOENIX-2116-v2.patch, PHOENIX-2116.patch


 When using flume, often times custom serializers are necessary to transform 
 data before sending to a sink. The existing Phoenix implementation however 
 makes it difficult to extend and add new functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2031) Unable to process timestamp/Date data loaded via Phoenix org.apache.phoenix.pig.PhoenixHBaseLoader

2015-08-22 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708113#comment-14708113
 ] 

maghamravikiran commented on PHOENIX-2031:
--

The latest build was successful. 
https://builds.apache.org/job/Phoenix-master/877/ . Hence closing this .

 Unable to process timestamp/Date data loaded via Phoenix 
 org.apache.phoenix.pig.PhoenixHBaseLoader
 --

 Key: PHOENIX-2031
 URL: https://issues.apache.org/jira/browse/PHOENIX-2031
 Project: Phoenix
  Issue Type: Bug
Reporter: Alicia Ying Shu
Assignee: Alicia Ying Shu
 Attachments: PHOENIX-2031-v1.patch, PHOENIX-2031-v2.patch, 
 PHOENIX-2031.patch


 2015-05-11 15:41:44,419 WARN main org.apache.hadoop.mapred.YarnChild: 
 Exception running child : org.apache.pig.PigException: ERROR 0: Error 
 transforming PhoenixRecord to Tuple Cannot convert a Unknown to a 
 java.sql.Timestamp at 
 org.apache.phoenix.pig.util.TypeUtil.transformToTuple(TypeUtil.java:293)
 at 
 org.apache.phoenix.pig.PhoenixHBaseLoader.getNext(PhoenixHBaseLoader.java:197)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2103) Pig tests aren't dropping tables as expected between test runs

2015-08-22 Thread maghamravikiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maghamravikiran resolved PHOENIX-2103.
--
Resolution: Not A Problem

 Pig tests aren't dropping tables as expected between test runs
 --

 Key: PHOENIX-2103
 URL: https://issues.apache.org/jira/browse/PHOENIX-2103
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: maghamravikiran
 Attachments: PHOENIX-2013-tests.patch, PHOENIX-2013-v1.patch, 
 PhoenixHBaseLoadIT.java


 Looks like PhoenixHBaseLoaderIT isn't derived from any of our base test 
 classes (hence it would not drop tables between classes). It should be 
 derived from BaseHBaseManagedTimeIT in which case it would call the @After 
 cleanUpAfterTest() method to drop tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

2015-08-21 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707110#comment-14707110
 ] 

maghamravikiran commented on PHOENIX-2154:
--

[~rvaleti] I believe you missed PhoenixTableOutputFormat class in the patch.  I 
am assuming you are updating the index table state in the 
PhoenixTableOutputFormat class ? 

 Failure of one mapper should not affect other mappers in MR index build
 ---

 Key: PHOENIX-2154
 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: maghamravikiran
 Attachments: IndexTool.java, PHOENIX-2154-WIP.patch, 
 PHOENIX-2154-_HBase_Frontdoor_API_WIP.patch


 Once a mapper in the MR index job succeeds, it should not need to be re-done 
 in the event of the failure of one of the other mappers. The initial 
 population of an index is based on a snapshot in time, so new rows getting 
 *after* the index build has started and/or failed do not impact it.
 Also, there's a 1:1 correspondence between index rows and table rows, so 
 there's really no need to dedup. However, the index rows will have a 
 different row key than the data table, so I'm not sure how the HFiles are 
 split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

2015-08-20 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705319#comment-14705319
 ] 

maghamravikiran commented on PHOENIX-2154:
--

My bad. Will move it to the reduce method to be absolutely sure.

 Failure of one mapper should not affect other mappers in MR index build
 ---

 Key: PHOENIX-2154
 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: maghamravikiran
 Attachments: IndexTool.java, PHOENIX-2154-WIP.patch


 Once a mapper in the MR index job succeeds, it should not need to be re-done 
 in the event of the failure of one of the other mappers. The initial 
 population of an index is based on a snapshot in time, so new rows getting 
 *after* the index build has started and/or failed do not impact it.
 Also, there's a 1:1 correspondence between index rows and table rows, so 
 there's really no need to dedup. However, the index rows will have a 
 different row key than the data table, so I'm not sure how the HFiles are 
 split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >