[jira] [Commented] (PHOENIX-3609) Detect and fix corrupted local index region during compaction

2017-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829325#comment-15829325
 ] 

Hadoop QA commented on PHOENIX-3609:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12848231/PHOENIX-3609.patch
  against master branch at commit a675211909415ca376e432d25f8a8822aadf5712.
  ATTACHMENT ID: 12848231

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+PhoenixConnection conn = 
DriverManager.getConnection(getUrl()).unwrap(PhoenixConnection.class);
+try (HTableInterface metaTable = 
conn.getQueryServices().getTable(TableName.META_TABLE_NAME.getName());
+statement.execute("upsert into " + tableName + "  values(" + i 
+ ",'fn" + i + "','ln" + i + "')");
+private void copyLocalIndexHFiles(Configuration conf, HRegionInfo 
fromRegion, HRegionInfo toRegion, boolean move)
+Path seondRegion = new Path(HTableDescriptor.getTableDir(root, 
fromRegion.getTableName()) + Path.SEPARATOR
+Path hfilePath = 
FSUtils.getCurrentFileSystem(conf).listFiles(seondRegion, 
true).next().getPath();
+Path firstRegionPath = new Path(HTableDescriptor.getTableDir(root, 
toRegion.getTableName()) + Path.SEPARATOR
+assertTrue(FileUtil.copy(currentFileSystem, hfilePath, 
currentFileSystem, firstRegionPath, move, conf));
+List scanners, ScanType scanType, long 
smallestReadPoint, long earliestPutTs,
+super(store, store.getScanInfo(), scan, scanners, scanType, 
smallestReadPoint, earliestPutTs);

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/735//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/735//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/735//console

This message is automatically generated.

> Detect and fix corrupted local index region during compaction
> -
>
> Key: PHOENIX-3609
> URL: https://issues.apache.org/jira/browse/PHOENIX-3609
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3609.patch
>
>
> Local index regions can be corrupted when hbck is run to fix the overlap 
> regions and directories are simply merged for them to create a single region.
> we can detect this during compaction by looking at the start keys of each 
> store files and comparing prefix with region start key. if local index for 
> the region is found inconsistent, we will read the store files of 
> corresponding data region and recreate a data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3549) Fail if binary field in CSV is not properly base64 encoded

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-3549:
---
Fix Version/s: 4.10.0

> Fail if binary field in CSV is not properly base64 encoded
> --
>
> Key: PHOENIX-3549
> URL: https://issues.apache.org/jira/browse/PHOENIX-3549
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (PHOENIX-3548) Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal reopened PHOENIX-3548:


Sorry, closed this by mistake.. documentation is yet to get in, so keeping it 
open.

> Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py
> --
>
> Key: PHOENIX-3548
> URL: https://issues.apache.org/jira/browse/PHOENIX-3548
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3549) Fail if binary field in CSV is not properly base64 encoded

2017-01-18 Thread Ankit Singhal (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829319#comment-15829319
 ] 

Ankit Singhal commented on PHOENIX-3549:


This is committed as a part of PHOENIX-3134.

> Fail if binary field in CSV is not properly base64 encoded
> --
>
> Key: PHOENIX-3549
> URL: https://issues.apache.org/jira/browse/PHOENIX-3549
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-3549) Fail if binary field in CSV is not properly base64 encoded

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal resolved PHOENIX-3549.

Resolution: Fixed

> Fail if binary field in CSV is not properly base64 encoded
> --
>
> Key: PHOENIX-3549
> URL: https://issues.apache.org/jira/browse/PHOENIX-3549
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3610) Fix tableName used to get the index maintainers while creating HalfStoreFileReader for local index store

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-3610:
---
Description: 
Physical Tablename is used instead of phoenix table name . 
IndexHalfStoreFileReaderGenerator#preStoreFileReaderOpen
{code}
TableName tableName = 
ctx.getEnvironment().getRegion().getTableDesc().getTableName();
..
try {


conn = 
QueryUtil.getConnectionOnServer(ctx.getEnvironment().getConfiguration()).unwrap(
PhoenixConnection.class);
PTable dataTable = PhoenixRuntime.getTableNoCache(conn, 
tableName.getNameAsString());
List indexes = dataTable.getIndexes();
Map indexMaintainers =
new HashMap();
for (PTable index : indexes) {
if (index.getIndexType() == IndexType.LOCAL) {
IndexMaintainer indexMaintainer = 
index.getIndexMaintainer(dataTable, conn);
indexMaintainers.put(new 
ImmutableBytesWritable(MetaDataUtil

.getViewIndexIdDataType().toBytes(index.getViewIndexId())),
indexMaintainer);
}
}
if(indexMaintainers.isEmpty()) return reader;
byte[][] viewConstants = getViewConstants(dataTable);
return new IndexHalfStoreFileReader(fs, p, cacheConf, in, size, 
r, ctx
.getEnvironment().getConfiguration(), indexMaintainers, 
viewConstants,
childRegion, regionStartKeyInHFile, splitKey);
{code}

  was:
Physical Tablename is used instead of phoenix table name .
{code}
TableName tableName = 
ctx.getEnvironment().getRegion().getTableDesc().getTableName();
..
.
try {


conn = 
QueryUtil.getConnectionOnServer(ctx.getEnvironment().getConfiguration()).unwrap(
PhoenixConnection.class);
PTable dataTable = PhoenixRuntime.getTableNoCache(conn, 
tableName.getNameAsString());
List indexes = dataTable.getIndexes();
Map indexMaintainers =
new HashMap();
for (PTable index : indexes) {
if (index.getIndexType() == IndexType.LOCAL) {
IndexMaintainer indexMaintainer = 
index.getIndexMaintainer(dataTable, conn);
indexMaintainers.put(new 
ImmutableBytesWritable(MetaDataUtil

.getViewIndexIdDataType().toBytes(index.getViewIndexId())),
indexMaintainer);
}
}
if(indexMaintainers.isEmpty()) return reader;
byte[][] viewConstants = getViewConstants(dataTable);
return new IndexHalfStoreFileReader(fs, p, cacheConf, in, size, 
r, ctx
.getEnvironment().getConfiguration(), indexMaintainers, 
viewConstants,
childRegion, regionStartKeyInHFile, splitKey);
{code]


> Fix tableName used to get the index maintainers while creating 
> HalfStoreFileReader for local index store
> 
>
> Key: PHOENIX-3610
> URL: https://issues.apache.org/jira/browse/PHOENIX-3610
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
>
> Physical Tablename is used instead of phoenix table name . 
> IndexHalfStoreFileReaderGenerator#preStoreFileReaderOpen
> {code}
> TableName tableName = 
> ctx.getEnvironment().getRegion().getTableDesc().getTableName();
> ..
> try {
> conn = 
> QueryUtil.getConnectionOnServer(ctx.getEnvironment().getConfiguration()).unwrap(
> PhoenixConnection.class);
> PTable dataTable = PhoenixRuntime.getTableNoCache(conn, 
> tableName.getNameAsString());
> List indexes = dataTable.getIndexes();
> Map indexMaintainers 
> =
> new HashMap IndexMaintainer>();
> for (PTable index : indexes) {
> if (index.getIndexType() == IndexType.LOCAL) {
> IndexMaintainer indexMaintainer = 
> index.getIndexMaintainer(dataTable, conn);
> indexMaintainers.put(new 
> ImmutableBytesWritable(MetaDataUtil
> 
> .getViewIndexIdDataType().toBytes(index.getViewIndexId())),
> indexMaintainer);
> }
> }
> if(indexMaintainers.isEmpty()) return reader;
> byte[][] viewConstants = getViewConstants(dataTable);
> 

[jira] [Created] (PHOENIX-3610) Fix tableName used to get the index maintainers while creating HalfStoreFileReader for local index store

2017-01-18 Thread Ankit Singhal (JIRA)
Ankit Singhal created PHOENIX-3610:
--

 Summary: Fix tableName used to get the index maintainers while 
creating HalfStoreFileReader for local index store
 Key: PHOENIX-3610
 URL: https://issues.apache.org/jira/browse/PHOENIX-3610
 Project: Phoenix
  Issue Type: Bug
Reporter: Ankit Singhal
Assignee: Ankit Singhal
 Fix For: 4.10.0


Physical Tablename is used instead of phoenix table name .
{code}
TableName tableName = 
ctx.getEnvironment().getRegion().getTableDesc().getTableName();
..
.
try {


conn = 
QueryUtil.getConnectionOnServer(ctx.getEnvironment().getConfiguration()).unwrap(
PhoenixConnection.class);
PTable dataTable = PhoenixRuntime.getTableNoCache(conn, 
tableName.getNameAsString());
List indexes = dataTable.getIndexes();
Map indexMaintainers =
new HashMap();
for (PTable index : indexes) {
if (index.getIndexType() == IndexType.LOCAL) {
IndexMaintainer indexMaintainer = 
index.getIndexMaintainer(dataTable, conn);
indexMaintainers.put(new 
ImmutableBytesWritable(MetaDataUtil

.getViewIndexIdDataType().toBytes(index.getViewIndexId())),
indexMaintainer);
}
}
if(indexMaintainers.isEmpty()) return reader;
byte[][] viewConstants = getViewConstants(dataTable);
return new IndexHalfStoreFileReader(fs, p, cacheConf, in, size, 
r, ctx
.getEnvironment().getConfiguration(), indexMaintainers, 
viewConstants,
childRegion, regionStartKeyInHFile, splitKey);
{code]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3609) Detect and fix corrupted local index region during compaction

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-3609:
---
Attachment: PHOENIX-3609.patch

> Detect and fix corrupted local index region during compaction
> -
>
> Key: PHOENIX-3609
> URL: https://issues.apache.org/jira/browse/PHOENIX-3609
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3609.patch
>
>
> Local index regions can be corrupted when hbck is run to fix the overlap 
> regions and directories are simply merged for them to create a single region.
> we can detect this during compaction by looking at the start keys of each 
> store files and comparing prefix with region start key. if local index for 
> the region is found inconsistent, we will read the store files of 
> corresponding data region and recreate a data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3609) Detect and fix corrupted local index region during compaction

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-3609:
---
Attachment: (was: BUG-67084_v1.patch)

> Detect and fix corrupted local index region during compaction
> -
>
> Key: PHOENIX-3609
> URL: https://issues.apache.org/jira/browse/PHOENIX-3609
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3609.patch
>
>
> Local index regions can be corrupted when hbck is run to fix the overlap 
> regions and directories are simply merged for them to create a single region.
> we can detect this during compaction by looking at the start keys of each 
> store files and comparing prefix with region start key. if local index for 
> the region is found inconsistent, we will read the store files of 
> corresponding data region and recreate a data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3609) Detect and fix corrupted local index region during compaction

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-3609:
---
Attachment: BUG-67084_v1.patch

[~chrajeshbab...@gmail.com], can you please review the attached patch.

-- Current check is just to check the prefix of store file's first row key with 
region start key and rebuild the region using data table store files.
-- It can give a false negative in some cases, where the data is in such a way 
that prefix can match. those cases are not yet handled. I may check IndexId and 
see if it is less than the highest indexId.


> Detect and fix corrupted local index region during compaction
> -
>
> Key: PHOENIX-3609
> URL: https://issues.apache.org/jira/browse/PHOENIX-3609
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Attachments: BUG-67084_v1.patch
>
>
> Local index regions can be corrupted when hbck is run to fix the overlap 
> regions and directories are simply merged for them to create a single region.
> we can detect this during compaction by looking at the start keys of each 
> store files and comparing prefix with region start key. if local index for 
> the region is found inconsistent, we will read the store files of 
> corresponding data region and recreate a data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3609) Detect and fix corrupted local index region during compaction

2017-01-18 Thread Ankit Singhal (JIRA)
Ankit Singhal created PHOENIX-3609:
--

 Summary: Detect and fix corrupted local index region during 
compaction
 Key: PHOENIX-3609
 URL: https://issues.apache.org/jira/browse/PHOENIX-3609
 Project: Phoenix
  Issue Type: Bug
Reporter: Ankit Singhal
Assignee: Ankit Singhal


Local index regions can be corrupted when hbck is run to fix the overlap 
regions and directories are simply merged for them to create a single region.

we can detect this during compaction by looking at the start keys of each store 
files and comparing prefix with region start key. if local index for the region 
is found inconsistent, we will read the store files of corresponding data 
region and recreate a data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3608) KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE

2017-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829297#comment-15829297
 ] 

Hadoop QA commented on PHOENIX-3608:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12848212/PHOENIX-3608.patch
  against master branch at commit a675211909415ca376e432d25f8a8822aadf5712.
  ATTACHMENT ID: 12848212

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.tx.TransactionIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/734//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/734//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/734//console

This message is automatically generated.

> KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE
> 
>
> Key: PHOENIX-3608
> URL: https://issues.apache.org/jira/browse/PHOENIX-3608
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
> Fix For: 4.10.0, 4.8.2
>
> Attachments: PHOENIX-3608.patch
>
>
> Currently KeyRange.intersect(KeyRange range) returning EMPTY_RANGE only when 
> the left side key range is null and right side is not null but we should 
> return EMPTY_RANGE when any of the key range is null.
> {noformat}
> if (this == IS_NULL_RANGE) {
> if (range == IS_NULL_RANGE) {
>  return IS_NULL_RANGE;
> }
>  return EMPTY_RANGE;
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3608) KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE

2017-01-18 Thread Rajeshbabu Chintaguntla (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajeshbabu Chintaguntla updated PHOENIX-3608:
-
Attachment: PHOENIX-3608.patch

Here is trivial patch. [~jamestaylor] please review.

> KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE
> 
>
> Key: PHOENIX-3608
> URL: https://issues.apache.org/jira/browse/PHOENIX-3608
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
> Attachments: PHOENIX-3608.patch
>
>
> Currently KeyRange.intersect(KeyRange range) returning EMPTY_RANGE only when 
> the left side key range is null and right side is not null but we should 
> return EMPTY_RANGE when any of the key range is null.
> {noformat}
> if (this == IS_NULL_RANGE) {
> if (range == IS_NULL_RANGE) {
>  return IS_NULL_RANGE;
> }
>  return EMPTY_RANGE;
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3608) KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE

2017-01-18 Thread Rajeshbabu Chintaguntla (JIRA)
Rajeshbabu Chintaguntla created PHOENIX-3608:


 Summary: KeyRange interset should return EMPTY_RANGE when one of 
it is NULL_RANGE
 Key: PHOENIX-3608
 URL: https://issues.apache.org/jira/browse/PHOENIX-3608
 Project: Phoenix
  Issue Type: Bug
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla


Currently KeyRange.intersect(KeyRange range) returning EMPTY_RANGE only when 
the left side key range is null and right side is not null but we should return 
EMPTY_RANGE when any of the key range is null.
{noformat}
if (this == IS_NULL_RANGE) {
if (range == IS_NULL_RANGE) {
 return IS_NULL_RANGE;
}
 return EMPTY_RANGE;
 }
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829044#comment-15829044
 ] 

Andrew Purtell commented on PHOENIX-3607:
-

We can evict from cache by LRU regardless. Though I see that will be another 
JIRA. As it should be.

We get hadoop semantics up through HBase via User because User refers to UGI, 
and Phoenix shouldn't change them, and testing User equality by name instead of 
subject identity would be such a change. I suggested this in a private 
discussion but now believe it to be wrong. Allowing only one cached entry by 
user-name can be something Phoenix chooses to do, though won't be necessary if 
we expire connections out of the cache by LRU in a reasonably timely manner. 
Will avoid leaks that way.   

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread Geoffrey Jacoby (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829028#comment-15829028
 ] 

Geoffrey Jacoby edited comment on PHOENIX-3607 at 1/19/17 12:14 AM:


The discussion in https://issues.apache.org/jira/browse/HADOOP-6670 and 
https://issues.apache.org/jira/browse/HADOOP-12529 gives some more context. 
HADOOP-6670 introduced the System.Object hashing, and HADOOP-12529 was a 
proposal to switch the hashing back that several developers argued against. 

Apparently the Hadoop developers had concerns with allowing Users with 
semantically equal username/subject/principal to count as equal because their 
respective UGI objects could have different Credentials attached. They were 
apparently worried that caching based on those assumptions could lead to either 
access denied errors or privilege escalations from clients getting connections 
with the wrong authentication.  


was (Author: gjacoby):
The discussion in https://issues.apache.org/jira/browse/HADOOP-6670 and 
https://issues.apache.org/jira/browse/HADOOP-12529 gives some more context. 
HADOOP-6670 introduced the System.Object hashing, and 

Apparently the Hadoop developers had concerns with allowing Users with 
semantically equal username/subject/principal to count as equal because their 
respective UGI objects could have different Credentials attached. They were 
apparently worried that caching based on those assumptions could lead to either 
access denied errors or privilege escalations from clients getting connections 
with the wrong authentication.  

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread Geoffrey Jacoby (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829028#comment-15829028
 ] 

Geoffrey Jacoby commented on PHOENIX-3607:
--

The discussion in https://issues.apache.org/jira/browse/HADOOP-6670 and 
https://issues.apache.org/jira/browse/HADOOP-12529 gives some more context. 
HADOOP-6670 introduced the System.Object hashing, and 

Apparently the Hadoop developers had concerns with allowing Users with 
semantically equal username/subject/principal to count as equal because their 
respective UGI objects could have different Credentials attached. They were 
apparently worried that caching based on those assumptions could lead to either 
access denied errors or privilege escalations from clients getting connections 
with the wrong authentication.  

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3519) Add COLUMN_ENCODED_BYTES table property

2017-01-18 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-3519:

Attachment: PHOENIX-3519-v2.patch

[~jamestaylor]

Thanks for the feedback, I have attached a v2 patch with review changes.

When we add a column that already exists in a child view to the base table it 
is propagated to child views add the existing column in the child view ordinal 
position might change to match that of the base table. 
For tables with encoded column qualifiers we don't allow users to add an 
existing view column to the base table. 
I changed the default behavior to not use encoded column qualifier in this 
patch (and reverted this change in PHOENIX-3586)

I had to make the following change in MetadataClient.createTable as the 
PostDDLCompiler plan was not being run for transactional tables so stats were 
not available after a transnational table was created.

I changed the check to be
{code}
if (isImmutableRows && encodingScheme != NON_ENCODED_QUALIFIERS) {
// force store nulls to true so delete markers aren't used
storeNulls = true;
tableProps.put(PhoenixDatabaseMetaData.STORE_NULLS, 
Boolean.TRUE);
}
{code}

I have added a unit test for "NONE".

> Add COLUMN_ENCODED_BYTES table property
> ---
>
> Key: PHOENIX-3519
> URL: https://issues.apache.org/jira/browse/PHOENIX-3519
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3519.patch, PHOENIX-3519-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Phoenix ODBC user experience

2017-01-18 Thread Biju N
Hi There,
If you are using Phoenix ODBC driver, could you please share your
experience interms of stability of the set-up (since it requires Query
server) and read/write performance. Performance is subjective but will give
us some data points to assess the option of using ODBC drivers with C++.

Thanks in advance for your feedback.

- Biju


[jira] [Comment Edited] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes

2017-01-18 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828857#comment-15828857
 ] 

Thomas D'Silva edited comment on PHOENIX-3586 at 1/18/17 10:11 PM:
---

I have attached a v3 patch which  sets the immutableStorageScheme in 
SingleCellColumnExpression by reading the last byte of the serialized array. I 
also changed the IMMUTABLE_STORAGE_SCHEME table property to be mutable. 
I forgot to mention that these serialization changes are b/w compatible because 
 SingleCellColumExpression was newly added. 

I was going to parameterize more tests in a separate JIRA PHOENIX-3446.


was (Author: tdsilva):
I have attached a v3 patch which  sets the immutableStorageScheme in 
SingleCellColumnExpression by reading the last byte of the serialized array. I 
also changed the IMMUTABLE_STORAGE_SCHEME table property to be mutable. 
I forgot to mention that these serialization changes are b/w compatible because 
 SingleCellColumExpression was newly added. 

I was going to parameterize more tests in a separate JIRA PHOENIX-3519.

> Add StorageScheme table property to allow users to specify their custom 
> storage schemes
> ---
>
> Key: PHOENIX-3586
> URL: https://issues.apache.org/jira/browse/PHOENIX-3586
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch, 
> PHOENIX-3586-v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes

2017-01-18 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-3586:

Attachment: PHOENIX-3586-v3.patch

I have attached a v3 patch which  sets the immutableStorageScheme in 
SingleCellColumnExpression by reading the last byte of the serialized array. I 
also changed the IMMUTABLE_STORAGE_SCHEME table property to be mutable. 
I forgot to mention that these serialization changes are b/w compatible because 
 SingleCellColumExpression was newly added. 

I was going to parameterize more tests in a separate JIRA PHOENIX-3519.

> Add StorageScheme table property to allow users to specify their custom 
> storage schemes
> ---
>
> Key: PHOENIX-3586
> URL: https://issues.apache.org/jira/browse/PHOENIX-3586
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch, 
> PHOENIX-3586-v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread Geoffrey Jacoby (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828822#comment-15828822
 ] 

Geoffrey Jacoby commented on PHOENIX-3607:
--

So then I think it comes down to a simple question: what should happen to the 
cache if a ConnectionInfo with a new-but-semantically-identical User comes 
along?

If the correct answer is "We shouldn't reuse the existing cached CQSI because 
ConnectionInfos are immutable and the new User should make it count as a 
different ConnectionInfo", then we should create a new CQSI, but the code 
should synchronously make sure that any semantically-identical ConnectionInfos 
are explicitly expired out of the cache at that time so we don't leak.

On the other hand, if the correct answer is "We should reuse the existing 
cached CQSI", then we should change the hashCode as proposed in this JIRA, so 
that the new ConnectionInfo hashes to the same value as the old one. 

Or, to put it another way: should we use object or semantic equality for 
ConnectionInfo objects?

Either way, I think we should go forward with the LRU expiration of the cache. 

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828733#comment-15828733
 ] 

Josh Elser commented on PHOENIX-3607:
-

Thanks, Geoffrey. This is super helpful.

bq. 1. Client requests a secure connection. The PhoenixEmbeddedDriver code logs 
them in, creates a ConnectionInfo, and caches, then returns the connection to 
the client.

Can you be more specific here? Does this mean that you're using the automatic 
Kerberos login via the JDBC url properties? If so, PHOENIX-3232 is relevant.

I'm still trying to re-wrap my head around the problem. I'm thinking that it's 
related to the final User field on the ConnectionInfo. I really don't think 
that Phoenix is managing these credentials correctly. PHOENIX-3126 was the 
original case which outlined this. Maybe ConnectionInfo's should only use the 
user-name to references the ConnectionQueryServices object, but this doesn't do 
anything to prevent the {{UGI.getCurrentUser()}} from not matching what is 
specified in the ConnectionInfo.

Along these lines, a comment I left to try to explain this.

{noformat}
// PHOENIX-3189 Because ConnectionInfo is immutable, we 
must make sure all parts of it are correct before
// construction; this also requires the Kerberos user 
credentials object (since they are compared by reference
// and not by value. If the user provided a principal and 
keytab via the JDBC url, we must make sure that the
// Kerberos login happens *before* we construct the 
ConnectionInfo object. Otherwise, the use of ConnectionInfo
// to determine when ConnectionQueryServices impl's should 
be reused will be broken.
{noformat}

bq.  2. Under a separate JIRA, we'd like to make the ConnectionInfo/CQSI cache 
gradually expire entries in LRU fashion. 

This sounds like a good idea. Hopefully, we can do this in a way that doesn't 
cause clients to do anything special when the elements are evicted.

Finally, sorry for leaving this mess in a bad state -- LMK how I can help.

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3584) Expose metrics for ConnectionQueryServices instances and their allocators in the JVM

2017-01-18 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828715#comment-15828715
 ] 

Sergey Soldatov commented on PHOENIX-3584:
--

Can this trace logging in ConnectionQueryServicesImpl be done in other way? 
When spark shell started with  phoenix  those 2 pages of exceptions may lead to 
heart attack.   

> Expose metrics for ConnectionQueryServices instances and their allocators in 
> the JVM
> 
>
> Key: PHOENIX-3584
> URL: https://issues.apache.org/jira/browse/PHOENIX-3584
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Samarth Jain
> Fix For: 4.9.1, 4.10.0
>
> Attachments: PHOENIX-3584.patch
>
>
> In the case a client is leaking Phoenix/HBase connections it would be helpful 
> to have metrics available on the number of ConnectionQueryServices 
> (ConnectionQueryServicesImpl) instances and who has allocated them. 
> For the latter, we could get a stacktrace when ConnectionQueryServicesImpls 
> are allocated (should be a relatively rare) and keep a count by hash of the 
> call stack (and save the call stack). Then we need a method to dump the hash 
> to callstack map as a string. This method can be called remotely by JMX when 
> debugging leaks in a live environment. Perhaps after the count of 
> ConnectionQueryServicesImpls goes over a configurable threshold we can also 
> log warnings that dump the counts by hash and callstacks corresponding to 
> those hashes. 
> Or, we should only have multiple ConnectionQueryServicesImpls if an optional 
> parameter is passed in the JDBC connect string. We could keep counts by that 
> parameter string and dump that instead of call stacks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3519) Add COLUMN_ENCODED_BYTES table property

2017-01-18 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828692#comment-15828692
 ] 

James Taylor commented on PHOENIX-3519:
---

Thanks for the patch, [~tdsilva]. Here's some feedback:
- Is this because we'll allocate a new slot for the new column and they won't 
like up based on name now? I think we should explain that.
{code}
+// For the non-diverged view, adding the column VIEW_COL2 will end 
up changing its ordinal position in the view.
{code}
- Why is this change needed?
{code}
-if (table == null || table.getType() == PTableType.VIEW || 
table.isTransactional()) {
+if (table == null || table.getType() == PTableType.VIEW /*|| 
table.isTransactional()*/) {
{code}
- Is this only when we store everything in a single KeyValue?
{code}
+if (isImmutableRows) {
+// force store nulls to true so delete markers aren't used
+storeNulls = true;
+tableProps.put(PhoenixDatabaseMetaData.STORE_NULLS, 
Boolean.TRUE);
+} 
+
{code}
- Do we have a unit test for usage of NONE?
{code}
+   if ("NONE".equalsIgnoreCase(strValue)) {
+   return 0;
+   } 
{code}

> Add COLUMN_ENCODED_BYTES table property
> ---
>
> Key: PHOENIX-3519
> URL: https://issues.apache.org/jira/browse/PHOENIX-3519
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3519.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread Geoffrey Jacoby (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828684#comment-15828684
 ] 

Geoffrey Jacoby commented on PHOENIX-3607:
--

[~elserj] We're seeing Phoenix connections leak over time in secure clusters. 
One use case we're concerned about is:

1. Client requests a secure connection. The PhoenixEmbeddedDriver code logs 
them in, creates a ConnectionInfo, and caches, then returns the connection to 
the client.

2. Sometime later, the client requests a secure connection again with the same 
credentials. ConnectionInfo#normalize realizes that the kerberos ticket has 
expired, and re-logs in with a different User object and different Subject, but 
semantically equal to the ConnectionInfo in step 1. 

3. It tries to look up a cached ConnectionQueryServicesImpl/HConnection in the 
cache, but it misses because the hash codes of the two ConnectionInfos are 
different. A new ConnectionQueryServicesImpl and HConnection are created, and 
put in the cache. 

4. The old cache entry from step 1 will never be used or purged, and so its 
CQSI and HConnection leaks. 

The two changes we're proposing as fixes are:
1. Change the hashing function of ConnectionInfo so that semantically identical 
instances will return the same hash code even if their User subjects are 
different instances (e.g. use User.getUserName() instead of User in 
ConnectionInfo.hashCode()) 
2. Under a separate JIRA, we'd like to make the ConnectionInfo/CQSI cache 
gradually expire entries in LRU fashion. 

[~apurtell]

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes

2017-01-18 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828665#comment-15828665
 ] 

James Taylor commented on PHOENIX-3586:
---

bq. If we derive the immutableStorageScheme from the serialized array, then all 
future serialization formats will need to store the enum ordinal in the 
serialized array. It is only written and read once.
We already serialize a byte for the format, right? Can this correspond to the 
enum ordinal? Would be nice if it was the last byte in the serialized bytes so 
we know where to find it. I think we need it in the data itself as if/when it 
changes, there'll be a mix of old and new potentially, in a row-by-row basis. 
WDYT?

Are we thinking to parameterize more tests? Like IndexIT perhaps?


> Add StorageScheme table property to allow users to specify their custom 
> storage schemes
> ---
>
> Key: PHOENIX-3586
> URL: https://issues.apache.org/jira/browse/PHOENIX-3586
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828655#comment-15828655
 ] 

Josh Elser commented on PHOENIX-3607:
-

I think there were issues WRT proxy-user authentication in this case. When the 
end-user was riding on top of some other user's credentials (I think this was 
ultimately stemming from an Apache Zeppelin use-case), it was leaking 
connections.

I also remember the convenience JDBC URL principal+keytab properties being 
problematic.

Is there a concrete case that you can share, [~gjacoby]? It would be good to 
have something firm that we can show does/does-not work, and then work from 
there. In general, this is rather difficult to do correctly due to the caching 
that PhoenixDriver tries to do. The inversion of control WRT the 
{{Subject.doAs}} calls makes it tough for Phoenix to encapsulate this while 
supporting multiple users in the same JVM.

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes

2017-01-18 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-3586:

Attachment: PHOENIX-3586-v2.patch

> Add StorageScheme table property to allow users to specify their custom 
> storage schemes
> ---
>
> Key: PHOENIX-3586
> URL: https://issues.apache.org/jira/browse/PHOENIX-3586
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes

2017-01-18 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-3586:

Attachment: (was: PHOENIX-3586-v2.patch)

> Add StorageScheme table property to allow users to specify their custom 
> storage schemes
> ---
>
> Key: PHOENIX-3586
> URL: https://issues.apache.org/jira/browse/PHOENIX-3586
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes

2017-01-18 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-3586:

Attachment: PHOENIX-3586-v2.patch

[~jamestaylor]

Thanks for the review, I have attached a v2 patch. I refactored the 
ImmutableStorageScheme enum, previously the was array data type builder was an 
instance variable which was not thread safe. 

Now the enum implements the following interface

{code}
+interface ColumnValueEncoderDecoderSupplier {
+ColumnValueEncoder getEncoder(int numElements);
+ColumnValueDecoder getDecoder();
 }

+public enum ImmutableStorageScheme implements 
ColumnValueEncoderDecoderSupplier {
+ONE_CELL_PER_COLUMN((byte)1) {
+@Override
+public ColumnValueEncoder getEncoder(int numElements) {
+throw new UnsupportedOperationException();
+}
+
+@Override
+public ColumnValueDecoder getDecoder() {
+throw new UnsupportedOperationException();
+}
+},
+// stores a single cell per column family that contains all serialized 
column values
+SINGLE_CELL_ARRAY_WITH_OFFSETS((byte)2) {
+@Override
+public ColumnValueEncoder getEncoder(int numElements) {
+PDataType type = PVarbinary.INSTANCE;
+int estimatedSize = PArrayDataType.estimateSize(numElements, 
type);
+TrustedByteArrayOutputStream byteStream = new 
TrustedByteArrayOutputStream(estimatedSize);
+DataOutputStream oStream = new DataOutputStream(byteStream);
+return new PArrayDataTypeEncoder(byteStream, oStream, 
numElements, type, SortOrder.ASC, false, 
PArrayDataType.IMMUTABLE_SERIALIZATION_VERSION);
+}
+
+@Override
+public ColumnValueDecoder getDecoder() {
+return new PArrayDataTypeDecoder();
+}
+};
{code}

Many of the test changes were because in PHOENIX-3519 I changed the default to 
not use encoded column qualifiers and tSINGLE_CELL_ARRAY_WITH_OFFSETS storage 
scheme. I reverted the change in this patch.  This caused changes to 
AlterMultiTenantTableWithViewsIT, StatsCollectorIT and MutationStateTest.

The following tests were parameterized : AlterTableIT, AlterTableWithViewsIT, 
StoreNullsIT, StatsCollectorIT

The following tests changed because I renamed StorageScheme to 
ImmutableStorageScheme : CorrelatePlanTest and LiteralResultIteratorPlanTest

I refactored ArrayConstructorExpressionTest and moved some other the tests into 
a new test ImmutableStorageSchemeTest.

I renamed PArrayDataTypeBytesArrayBuilder to PArrayDataTypeEncoder and so 
PDataTypeForArraysTest changed.

[~samarthjain] is going to add tests for different # of bytes for the column 
qualifier.

COLUMN_ENCODED_BYTES does have a "none" option in the patch I attached to 
PHOENIX-3519.

If we derive the immutableStorageScheme from the serialized array, then all 
future serialization formats will need to store the enum ordinal in the 
serialized array. It is only written and read once. 
WritableUtils.writeEnum serializes the enum using the name, in the v2 patch I 
modified the code to use the enum ordinal.

Samarth handled adding the IMMUTABLE_STORAGE_SCHEME column during the schema 
upgrade in PHOENIX-3447.

> Add StorageScheme table property to allow users to specify their custom 
> storage schemes
> ---
>
> Key: PHOENIX-3586
> URL: https://issues.apache.org/jira/browse/PHOENIX-3586
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828553#comment-15828553
 ] 

James Taylor commented on PHOENIX-3607:
---

[~elserj] - remember PHOENIX-3189 and the discussion we had on putting User in 
the hasCode of ConnectionInfo [1][2]? It's causing issues. Can you remind us 
why this was done? It seems that using user.getName() instead would not be good 
for PQS?

[1] https://github.com/apache/phoenix/pull/191#issuecomment-242566530
[2] https://github.com/apache/phoenix/pull/191#issuecomment-243230922

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-01-18 Thread Geoffrey Jacoby (JIRA)
Geoffrey Jacoby created PHOENIX-3607:


 Summary: Change hashCode calculation for caching 
ConnectionQueryServicesImpls
 Key: PHOENIX-3607
 URL: https://issues.apache.org/jira/browse/PHOENIX-3607
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.8.0, 4.9.0
Reporter: Geoffrey Jacoby
Assignee: Geoffrey Jacoby


PhoenixDriver maintains a cache of ConnectionInfo -> 
ConnectionQueryServicesImpl (each of which holds a single HConnection) : 

The hash code of ConnectionInfo in part uses the hash code of its HBase User 
object, which uses the *identity hash* of the Subject allocated at login. There 
are concerns about the stability of this hashcode. When we log out and log in 
after TGT refresh, will we have a new Subject?

To be defensive, we should do a hash of the string returned by user.getName() 
instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3600) Core MapReduce classes don't provide location info

2017-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828412#comment-15828412
 ] 

Hadoop QA commented on PHOENIX-3600:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12848082/PHOENIX-3600.patch
  against master branch at commit a675211909415ca376e432d25f8a8822aadf5712.
  ATTACHMENT ID: 12848082

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+private List generateSplits(final QueryPlan qplan, final 
List splits, Configuration config) throws IOException {
+org.apache.hadoop.hbase.client.Connection connection = 
ConnectionFactory.createConnection(config);
+psplits.add(new 
PhoenixInputSplit(Lists.newArrayList(aScan), regionSize, regionLocation));

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/733//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/733//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/733//console

This message is automatically generated.

> Core MapReduce classes don't provide location info
> --
>
> Key: PHOENIX-3600
> URL: https://issues.apache.org/jira/browse/PHOENIX-3600
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
> Attachments: PHOENIX-3600.patch
>
>
> The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} 
> and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region 
> size or location information, leaving the execution engine (MR, Spark, etc.) 
> to randomly assign splits to nodes.
> Interestingly, the phoenix-hive module has reimplemented these classes, 
> including the node-aware functionality. We should port a subset of those 
> changes back to the core code so that other engines can make use of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


ApacheCon CFP closing soon (11 February)

2017-01-18 Thread Rich Bowen
Hello, fellow Apache enthusiast. Thanks for your participation, and
interest in, the projects of the Apache Software Foundation.

I wanted to remind you that the Call For Papers (CFP) for ApacheCon
North America, and Apache: Big Data North America, closes in less than a
month. If you've been putting it off because there was lots of time
left, it's time to dig for that inspiration and get those talk proposals in.

It's also time to discuss with your developer and user community whether
there's a track of talks that you might want to propose, so that you
have more complete coverage of your project than a talk or two.

We're looking for talks directly, and indirectly, related to projects at
the Apache Software Foundation. These can be anything from in-depth
technical discussions of the projects you work with, to talks about
community, documentation, legal issues, marketing, and so on. We're also
very interested in talks about projects and services built on top of
Apache projects, and case studies of how you use Apache projects to
solve real-world problems.

We are particularly interested in presentations from Apache projects
either in the Incubator, or recently graduated. ApacheCon is where
people come to find out what technology they'll be using this time next
year.

Important URLs are:

To submit a talk for Apache: Big Data -
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp
To submit a talk for ApacheCon -
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp

To register for Apache: Big Data -
http://events.linuxfoundation.org/events/apache-big-data-north-america/attend/register-
To register for ApacheCon -
http://events.linuxfoundation.org/events/apachecon-north-america/attend/register-

Early Bird registration rates end March 12th, but if you're a committer
on an Apache project, you get the low committer rate, which is less than
half of the early bird rate!

For further updated about ApacheCon, follow us on Twitter, @ApacheCon,
or drop by our IRC channel, #apachecon on the Freenode IRC network. Or
contact me - rbo...@apache.org - with any questions or concerns.

Thanks!

Rich Bowen, VP Conferences, Apache Software Foundation

-- 
(You've received this email because you're on a dev@ or users@ mailing
list of an Apache Software Foundation project. For subscription and
unsubscription information, consult the headers of this email message,
as this varies from one list to another.)


[jira] [Commented] (PHOENIX-3601) PhoenixRDD doesn't expose the preferred node locations to Spark

2017-01-18 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828357#comment-15828357
 ] 

Josh Mahonin commented on PHOENIX-3601:
---

Trivial patch, most of the functionality comes from PHOENIX-3600. Unfortunately 
since PhoenixRDD extends 'RDD' and not 'NewHadoopRDD' we don't get some of the 
niceties for free. There was a good reason for this that's now lost to me...

TL;DR: If used in conjunction with PHOENIX-3600, I observed Spark data load 
times decrease by 30-40%.

 Longer version:

Using [~elserj]'s take on the very cool 
https://github.com/joshelser/phoenix-performance toolset, I generated about 
114M rows of TPC-DS data on a 5 RegionServer setup. I used a load-factor of 5, 
which created a 256-way split table we'll refer to as SALES. I also created a 
new table, pre-salted with 5 buckets we'll call SALES2 and UPSERT SELECTed the 
data over. Both tables had major compaction and UPDATE STATISTICS run on them 
as well.

Using HDP 2.5 (Phoenix 4.7, Spark 1.6), I invoked spark-shell with 5 executors 
and 2 cores each. Each executor was co-located with one Region Server. I then 
created a Phoenix RDD for each table, and then ran a Spark {{rdd.count}} 
operation on them. This effectively loads the entire table into Spark, and then 
Spark counts the rows. I ran this for each table, using the default case, then 
just the location changes, then the location changes plus the split.by.stats 
changes and recorded the run-times 4 times each. I also closed out the 
spark-shell and ensured any Spark-cached files were removed, although I didn't 
account for caching on the HBase or OS side.


||SALES (256 regions, 261 stats splits)||t1||t2||t3||t4||
|control|120s|116s|111s|125s|
|location|96s|106s|94s|100s|
|location+stats|82s|74s|82s|82s|  
||SALES2 (10 regions, 50 stats splits) ||t1||t2||t3||t4||
|control|102s|83s|92s|96s|
|location|94s|78s|90s|81s|
|location+stats|62s|70s|79s|58s|

I have more screencaps of the Spark executors that report on the various task 
jobs, but in short, what we see is the individual task times are much more 
evenly distributed (i.e. fewer outliers), and the overall task time is also 
decreased due to less network overhead.

If anyone's using phoenix-spark and is able to test it out, that would be 
great. Also cc [~maghamraviki...@gmail.com] [~ndimiduk] [~sergey.soldatov] 
[~elserj] [~jamestaylor]

> PhoenixRDD doesn't expose the preferred node locations to Spark
> ---
>
> Key: PHOENIX-3601
> URL: https://issues.apache.org/jira/browse/PHOENIX-3601
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
> Attachments: PHOENIX-3601.patch
>
>
> Follow-up to PHOENIX-3600, in order to let Spark know the preferred node 
> locations to assign partitions to, we need to update PhoenixRDD to retrieve 
> the underlying node location information from the splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3601) PhoenixRDD doesn't expose the preferred node locations to Spark

2017-01-18 Thread Josh Mahonin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Mahonin updated PHOENIX-3601:
--
Attachment: PHOENIX-3601.patch

> PhoenixRDD doesn't expose the preferred node locations to Spark
> ---
>
> Key: PHOENIX-3601
> URL: https://issues.apache.org/jira/browse/PHOENIX-3601
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
> Attachments: PHOENIX-3601.patch
>
>
> Follow-up to PHOENIX-3600, in order to let Spark know the preferred node 
> locations to assign partitions to, we need to update PhoenixRDD to retrieve 
> the underlying node location information from the splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3600) Core MapReduce classes don't provide location info

2017-01-18 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828307#comment-15828307
 ] 

Josh Mahonin commented on PHOENIX-3600:
---

Mostly just ported the Phoenix/MR specific code from:

https://github.com/apache/phoenix/blob/master/phoenix-hive/src/main/java/org/apache/phoenix/hive/mapreduce/PhoenixInputFormat.java#L151-L216

https://github.com/apache/phoenix/blob/master/phoenix-hive/src/main/java/org/apache/phoenix/hive/mapreduce/PhoenixInputSplit.java

Also included a new Configuration property "phoenix.mapreduce.split.by.stats" 
which does effectively the same thing as the Hive-specific "split.by.stats". In 
short, the MR code was only generating InputSplits based on Region Splits, and 
wasn't taking into account the possibility of more scans being generated by the 
statistics collection.

I'll follow-up on PHOENIX-3601 with the performance results I gathered in some 
Spark testing, but it would be awesome if other folks using Phoenix MR 
integration in some capacity could test this out. It's a bit of a 
double-whammy, since this patch gets us both node-awareness for the splits, and 
increases the potential parallelism by including the statistics-generated 
scans. cc [~maghamraviki...@gmail.com] [~ndimiduk] [~sergey.soldatov] [~elserj] 
[~jamestaylor], perhaps others :)

> Core MapReduce classes don't provide location info
> --
>
> Key: PHOENIX-3600
> URL: https://issues.apache.org/jira/browse/PHOENIX-3600
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
> Attachments: PHOENIX-3600.patch
>
>
> The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} 
> and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region 
> size or location information, leaving the execution engine (MR, Spark, etc.) 
> to randomly assign splits to nodes.
> Interestingly, the phoenix-hive module has reimplemented these classes, 
> including the node-aware functionality. We should port a subset of those 
> changes back to the core code so that other engines can make use of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3600) Core MapReduce classes don't provide location info

2017-01-18 Thread Josh Mahonin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Mahonin updated PHOENIX-3600:
--
Attachment: PHOENIX-3600.patch

> Core MapReduce classes don't provide location info
> --
>
> Key: PHOENIX-3600
> URL: https://issues.apache.org/jira/browse/PHOENIX-3600
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
> Attachments: PHOENIX-3600.patch
>
>
> The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} 
> and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region 
> size or location information, leaving the execution engine (MR, Spark, etc.) 
> to randomly assign splits to nodes.
> Interestingly, the phoenix-hive module has reimplemented these classes, 
> including the node-aware functionality. We should port a subset of those 
> changes back to the core code so that other engines can make use of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3336) get the wrong results when using the local secondary index

2017-01-18 Thread Houliang Qi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houliang Qi updated PHOENIX-3336:
-
Description: 
When using phoenix local secondary index, two clients concurrent upsert data to 
the same row key, while the other client using the index column to retrieve 
data, it gets the wrong results.

Just like the attachments, I create one table called orders_5, and create one 
local index on table orders_5, called clerk_5; then I use two clients to upsert 
data to the same row key of  table orders_5, and you will see that, the local 
index clerk_5 have some stale record (maybe its OK for eventual consistency),  
however, when you use the previous value to retrieve the record, you can still 
get the result, even more serious, the result is wrong, namely it not the 
record which you have insert before, and also not the record in the primary 
table(in the case ,is the table orders_5)


  was:
When using phoenix local secondary index, two clients concurrent upsert data to 
the same row key, while using the index column to retrieve data, it gets the 
wrong results.

Just like the attachments, I create one table called orders_5, and create one 
local index on table orders_5, called clerk_5; then I use two clients to upsert 
data to the same row key of  table orders_5, and you will see that, the local 
index clerk_5 have some stale record (maybe its OK for eventual consistency),  
however, when you use the previous value to retrieve the record, you can still 
get the result, even more serious, the result is wrong, namely it not the 
record which you have insert before, and also not the record in the primary 
table(in the case ,is the table orders_5)



> get the wrong results when using the local secondary index
> --
>
> Key: PHOENIX-3336
> URL: https://issues.apache.org/jira/browse/PHOENIX-3336
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
> Environment: hbase-1.1.2
>Reporter: Houliang Qi
>  Labels: phoenix, secondaryIndex
> Attachments: create_table_orders.sql, readme.txt, sample_1.csv, 
> sample_2.csv, wrong-index-2.png
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> When using phoenix local secondary index, two clients concurrent upsert data 
> to the same row key, while the other client using the index column to 
> retrieve data, it gets the wrong results.
> Just like the attachments, I create one table called orders_5, and create one 
> local index on table orders_5, called clerk_5; then I use two clients to 
> upsert data to the same row key of  table orders_5, and you will see that, 
> the local index clerk_5 have some stale record (maybe its OK for eventual 
> consistency),  however, when you use the previous value to retrieve the 
> record, you can still get the result, even more serious, the result is wrong, 
> namely it not the record which you have insert before, and also not the 
> record in the primary table(in the case ,is the table orders_5)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3134) varbinary fields bulk load difference between MR/psql and upserts

2017-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827706#comment-15827706
 ] 

Hudson commented on PHOENIX-3134:
-

SUCCESS: Integrated in Jenkins build Phoenix-master #1535 (See 
[https://builds.apache.org/job/Phoenix-master/1535/])
PHOENIX-3134 varbinary fields bulk load difference between MR/psql and 
(ankitsinghal59: rev a675211909415ca376e432d25f8a8822aadf5712)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/util/csv/CsvUpsertExecutorTest.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvBulkLoadTool.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/types/PBinary.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvBulkImportUtil.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/util/AbstractUpsertExecutorTest.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/util/PhoenixRuntime.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/util/json/JsonUpsertExecutor.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/util/json/JsonUpsertExecutorTest.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/schema/types/PVarbinary.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/util/csv/CsvUpsertExecutor.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/function/EncodeFormat.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/mapreduce/CsvBulkImportUtilTest.java


> varbinary fields bulk load difference between MR/psql and upserts
> -
>
> Key: PHOENIX-3134
> URL: https://issues.apache.org/jira/browse/PHOENIX-3134
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Sergey Soldatov
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3134.patch, PHOENIX-3134_v1.patch
>
>
> At the moment we have strange difference between how MR/psql upload and 
> upsert handles varbinary. MR/ psql expects that it's base64 encoded whereas 
> upsert takes input as a string. Should we add an option to load it as a plain 
> data or base64 in MR/psql?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-3548) Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal resolved PHOENIX-3548.

Resolution: Fixed

committed as part of PHOENIX-3134.


> Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py
> --
>
> Key: PHOENIX-3548
> URL: https://issues.apache.org/jira/browse/PHOENIX-3548
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3548) Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py

2017-01-18 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-3548:
---
Fix Version/s: 4.10.0

> Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py
> --
>
> Key: PHOENIX-3548
> URL: https://issues.apache.org/jira/browse/PHOENIX-3548
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)