[jira] [Commented] (PHOENIX-3609) Detect and fix corrupted local index region during compaction
[ https://issues.apache.org/jira/browse/PHOENIX-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829325#comment-15829325 ] Hadoop QA commented on PHOENIX-3609: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12848231/PHOENIX-3609.patch against master branch at commit a675211909415ca376e432d25f8a8822aadf5712. ATTACHMENT ID: 12848231 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +PhoenixConnection conn = DriverManager.getConnection(getUrl()).unwrap(PhoenixConnection.class); +try (HTableInterface metaTable = conn.getQueryServices().getTable(TableName.META_TABLE_NAME.getName()); +statement.execute("upsert into " + tableName + " values(" + i + ",'fn" + i + "','ln" + i + "')"); +private void copyLocalIndexHFiles(Configuration conf, HRegionInfo fromRegion, HRegionInfo toRegion, boolean move) +Path seondRegion = new Path(HTableDescriptor.getTableDir(root, fromRegion.getTableName()) + Path.SEPARATOR +Path hfilePath = FSUtils.getCurrentFileSystem(conf).listFiles(seondRegion, true).next().getPath(); +Path firstRegionPath = new Path(HTableDescriptor.getTableDir(root, toRegion.getTableName()) + Path.SEPARATOR +assertTrue(FileUtil.copy(currentFileSystem, hfilePath, currentFileSystem, firstRegionPath, move, conf)); +List scanners, ScanType scanType, long smallestReadPoint, long earliestPutTs, +super(store, store.getScanInfo(), scan, scanners, scanType, smallestReadPoint, earliestPutTs); {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/735//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/735//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/735//console This message is automatically generated. > Detect and fix corrupted local index region during compaction > - > > Key: PHOENIX-3609 > URL: https://issues.apache.org/jira/browse/PHOENIX-3609 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > Attachments: PHOENIX-3609.patch > > > Local index regions can be corrupted when hbck is run to fix the overlap > regions and directories are simply merged for them to create a single region. > we can detect this during compaction by looking at the start keys of each > store files and comparing prefix with region start key. if local index for > the region is found inconsistent, we will read the store files of > corresponding data region and recreate a data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3549) Fail if binary field in CSV is not properly base64 encoded
[ https://issues.apache.org/jira/browse/PHOENIX-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated PHOENIX-3549: --- Fix Version/s: 4.10.0 > Fail if binary field in CSV is not properly base64 encoded > -- > > Key: PHOENIX-3549 > URL: https://issues.apache.org/jira/browse/PHOENIX-3549 > Project: Phoenix > Issue Type: Sub-task >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (PHOENIX-3548) Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py
[ https://issues.apache.org/jira/browse/PHOENIX-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal reopened PHOENIX-3548: Sorry, closed this by mistake.. documentation is yet to get in, so keeping it open. > Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py > -- > > Key: PHOENIX-3548 > URL: https://issues.apache.org/jira/browse/PHOENIX-3548 > Project: Phoenix > Issue Type: Sub-task >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3549) Fail if binary field in CSV is not properly base64 encoded
[ https://issues.apache.org/jira/browse/PHOENIX-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829319#comment-15829319 ] Ankit Singhal commented on PHOENIX-3549: This is committed as a part of PHOENIX-3134. > Fail if binary field in CSV is not properly base64 encoded > -- > > Key: PHOENIX-3549 > URL: https://issues.apache.org/jira/browse/PHOENIX-3549 > Project: Phoenix > Issue Type: Sub-task >Reporter: Ankit Singhal >Assignee: Ankit Singhal > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-3549) Fail if binary field in CSV is not properly base64 encoded
[ https://issues.apache.org/jira/browse/PHOENIX-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal resolved PHOENIX-3549. Resolution: Fixed > Fail if binary field in CSV is not properly base64 encoded > -- > > Key: PHOENIX-3549 > URL: https://issues.apache.org/jira/browse/PHOENIX-3549 > Project: Phoenix > Issue Type: Sub-task >Reporter: Ankit Singhal >Assignee: Ankit Singhal > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3610) Fix tableName used to get the index maintainers while creating HalfStoreFileReader for local index store
[ https://issues.apache.org/jira/browse/PHOENIX-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated PHOENIX-3610: --- Description: Physical Tablename is used instead of phoenix table name . IndexHalfStoreFileReaderGenerator#preStoreFileReaderOpen {code} TableName tableName = ctx.getEnvironment().getRegion().getTableDesc().getTableName(); .. try { conn = QueryUtil.getConnectionOnServer(ctx.getEnvironment().getConfiguration()).unwrap( PhoenixConnection.class); PTable dataTable = PhoenixRuntime.getTableNoCache(conn, tableName.getNameAsString()); List indexes = dataTable.getIndexes(); Map indexMaintainers = new HashMap(); for (PTable index : indexes) { if (index.getIndexType() == IndexType.LOCAL) { IndexMaintainer indexMaintainer = index.getIndexMaintainer(dataTable, conn); indexMaintainers.put(new ImmutableBytesWritable(MetaDataUtil .getViewIndexIdDataType().toBytes(index.getViewIndexId())), indexMaintainer); } } if(indexMaintainers.isEmpty()) return reader; byte[][] viewConstants = getViewConstants(dataTable); return new IndexHalfStoreFileReader(fs, p, cacheConf, in, size, r, ctx .getEnvironment().getConfiguration(), indexMaintainers, viewConstants, childRegion, regionStartKeyInHFile, splitKey); {code} was: Physical Tablename is used instead of phoenix table name . {code} TableName tableName = ctx.getEnvironment().getRegion().getTableDesc().getTableName(); .. . try { conn = QueryUtil.getConnectionOnServer(ctx.getEnvironment().getConfiguration()).unwrap( PhoenixConnection.class); PTable dataTable = PhoenixRuntime.getTableNoCache(conn, tableName.getNameAsString()); List indexes = dataTable.getIndexes(); Map indexMaintainers = new HashMap(); for (PTable index : indexes) { if (index.getIndexType() == IndexType.LOCAL) { IndexMaintainer indexMaintainer = index.getIndexMaintainer(dataTable, conn); indexMaintainers.put(new ImmutableBytesWritable(MetaDataUtil .getViewIndexIdDataType().toBytes(index.getViewIndexId())), indexMaintainer); } } if(indexMaintainers.isEmpty()) return reader; byte[][] viewConstants = getViewConstants(dataTable); return new IndexHalfStoreFileReader(fs, p, cacheConf, in, size, r, ctx .getEnvironment().getConfiguration(), indexMaintainers, viewConstants, childRegion, regionStartKeyInHFile, splitKey); {code] > Fix tableName used to get the index maintainers while creating > HalfStoreFileReader for local index store > > > Key: PHOENIX-3610 > URL: https://issues.apache.org/jira/browse/PHOENIX-3610 > Project: Phoenix > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > > Physical Tablename is used instead of phoenix table name . > IndexHalfStoreFileReaderGenerator#preStoreFileReaderOpen > {code} > TableName tableName = > ctx.getEnvironment().getRegion().getTableDesc().getTableName(); > .. > try { > conn = > QueryUtil.getConnectionOnServer(ctx.getEnvironment().getConfiguration()).unwrap( > PhoenixConnection.class); > PTable dataTable = PhoenixRuntime.getTableNoCache(conn, > tableName.getNameAsString()); > List indexes = dataTable.getIndexes(); > Map indexMaintainers > = > new HashMap IndexMaintainer>(); > for (PTable index : indexes) { > if (index.getIndexType() == IndexType.LOCAL) { > IndexMaintainer indexMaintainer = > index.getIndexMaintainer(dataTable, conn); > indexMaintainers.put(new > ImmutableBytesWritable(MetaDataUtil > > .getViewIndexIdDataType().toBytes(index.getViewIndexId())), > indexMaintainer); > } > } > if(indexMaintainers.isEmpty()) return reader; > byte[][] viewConstants = getViewConstants(dataTable); >
[jira] [Created] (PHOENIX-3610) Fix tableName used to get the index maintainers while creating HalfStoreFileReader for local index store
Ankit Singhal created PHOENIX-3610: -- Summary: Fix tableName used to get the index maintainers while creating HalfStoreFileReader for local index store Key: PHOENIX-3610 URL: https://issues.apache.org/jira/browse/PHOENIX-3610 Project: Phoenix Issue Type: Bug Reporter: Ankit Singhal Assignee: Ankit Singhal Fix For: 4.10.0 Physical Tablename is used instead of phoenix table name . {code} TableName tableName = ctx.getEnvironment().getRegion().getTableDesc().getTableName(); .. . try { conn = QueryUtil.getConnectionOnServer(ctx.getEnvironment().getConfiguration()).unwrap( PhoenixConnection.class); PTable dataTable = PhoenixRuntime.getTableNoCache(conn, tableName.getNameAsString()); List indexes = dataTable.getIndexes(); Map indexMaintainers = new HashMap(); for (PTable index : indexes) { if (index.getIndexType() == IndexType.LOCAL) { IndexMaintainer indexMaintainer = index.getIndexMaintainer(dataTable, conn); indexMaintainers.put(new ImmutableBytesWritable(MetaDataUtil .getViewIndexIdDataType().toBytes(index.getViewIndexId())), indexMaintainer); } } if(indexMaintainers.isEmpty()) return reader; byte[][] viewConstants = getViewConstants(dataTable); return new IndexHalfStoreFileReader(fs, p, cacheConf, in, size, r, ctx .getEnvironment().getConfiguration(), indexMaintainers, viewConstants, childRegion, regionStartKeyInHFile, splitKey); {code] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3609) Detect and fix corrupted local index region during compaction
[ https://issues.apache.org/jira/browse/PHOENIX-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated PHOENIX-3609: --- Attachment: PHOENIX-3609.patch > Detect and fix corrupted local index region during compaction > - > > Key: PHOENIX-3609 > URL: https://issues.apache.org/jira/browse/PHOENIX-3609 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > Attachments: PHOENIX-3609.patch > > > Local index regions can be corrupted when hbck is run to fix the overlap > regions and directories are simply merged for them to create a single region. > we can detect this during compaction by looking at the start keys of each > store files and comparing prefix with region start key. if local index for > the region is found inconsistent, we will read the store files of > corresponding data region and recreate a data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3609) Detect and fix corrupted local index region during compaction
[ https://issues.apache.org/jira/browse/PHOENIX-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated PHOENIX-3609: --- Attachment: (was: BUG-67084_v1.patch) > Detect and fix corrupted local index region during compaction > - > > Key: PHOENIX-3609 > URL: https://issues.apache.org/jira/browse/PHOENIX-3609 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > Attachments: PHOENIX-3609.patch > > > Local index regions can be corrupted when hbck is run to fix the overlap > regions and directories are simply merged for them to create a single region. > we can detect this during compaction by looking at the start keys of each > store files and comparing prefix with region start key. if local index for > the region is found inconsistent, we will read the store files of > corresponding data region and recreate a data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3609) Detect and fix corrupted local index region during compaction
[ https://issues.apache.org/jira/browse/PHOENIX-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated PHOENIX-3609: --- Attachment: BUG-67084_v1.patch [~chrajeshbab...@gmail.com], can you please review the attached patch. -- Current check is just to check the prefix of store file's first row key with region start key and rebuild the region using data table store files. -- It can give a false negative in some cases, where the data is in such a way that prefix can match. those cases are not yet handled. I may check IndexId and see if it is less than the highest indexId. > Detect and fix corrupted local index region during compaction > - > > Key: PHOENIX-3609 > URL: https://issues.apache.org/jira/browse/PHOENIX-3609 > Project: Phoenix > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Attachments: BUG-67084_v1.patch > > > Local index regions can be corrupted when hbck is run to fix the overlap > regions and directories are simply merged for them to create a single region. > we can detect this during compaction by looking at the start keys of each > store files and comparing prefix with region start key. if local index for > the region is found inconsistent, we will read the store files of > corresponding data region and recreate a data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-3609) Detect and fix corrupted local index region during compaction
Ankit Singhal created PHOENIX-3609: -- Summary: Detect and fix corrupted local index region during compaction Key: PHOENIX-3609 URL: https://issues.apache.org/jira/browse/PHOENIX-3609 Project: Phoenix Issue Type: Bug Reporter: Ankit Singhal Assignee: Ankit Singhal Local index regions can be corrupted when hbck is run to fix the overlap regions and directories are simply merged for them to create a single region. we can detect this during compaction by looking at the start keys of each store files and comparing prefix with region start key. if local index for the region is found inconsistent, we will read the store files of corresponding data region and recreate a data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3608) KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE
[ https://issues.apache.org/jira/browse/PHOENIX-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829297#comment-15829297 ] Hadoop QA commented on PHOENIX-3608: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12848212/PHOENIX-3608.patch against master branch at commit a675211909415ca376e432d25f8a8822aadf5712. ATTACHMENT ID: 12848212 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.tx.TransactionIT Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/734//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/734//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/734//console This message is automatically generated. > KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE > > > Key: PHOENIX-3608 > URL: https://issues.apache.org/jira/browse/PHOENIX-3608 > Project: Phoenix > Issue Type: Bug >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla > Fix For: 4.10.0, 4.8.2 > > Attachments: PHOENIX-3608.patch > > > Currently KeyRange.intersect(KeyRange range) returning EMPTY_RANGE only when > the left side key range is null and right side is not null but we should > return EMPTY_RANGE when any of the key range is null. > {noformat} > if (this == IS_NULL_RANGE) { > if (range == IS_NULL_RANGE) { > return IS_NULL_RANGE; > } > return EMPTY_RANGE; > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3608) KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE
[ https://issues.apache.org/jira/browse/PHOENIX-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajeshbabu Chintaguntla updated PHOENIX-3608: - Attachment: PHOENIX-3608.patch Here is trivial patch. [~jamestaylor] please review. > KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE > > > Key: PHOENIX-3608 > URL: https://issues.apache.org/jira/browse/PHOENIX-3608 > Project: Phoenix > Issue Type: Bug >Reporter: Rajeshbabu Chintaguntla >Assignee: Rajeshbabu Chintaguntla > Attachments: PHOENIX-3608.patch > > > Currently KeyRange.intersect(KeyRange range) returning EMPTY_RANGE only when > the left side key range is null and right side is not null but we should > return EMPTY_RANGE when any of the key range is null. > {noformat} > if (this == IS_NULL_RANGE) { > if (range == IS_NULL_RANGE) { > return IS_NULL_RANGE; > } > return EMPTY_RANGE; > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-3608) KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE
Rajeshbabu Chintaguntla created PHOENIX-3608: Summary: KeyRange interset should return EMPTY_RANGE when one of it is NULL_RANGE Key: PHOENIX-3608 URL: https://issues.apache.org/jira/browse/PHOENIX-3608 Project: Phoenix Issue Type: Bug Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla Currently KeyRange.intersect(KeyRange range) returning EMPTY_RANGE only when the left side key range is null and right side is not null but we should return EMPTY_RANGE when any of the key range is null. {noformat} if (this == IS_NULL_RANGE) { if (range == IS_NULL_RANGE) { return IS_NULL_RANGE; } return EMPTY_RANGE; } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
[ https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829044#comment-15829044 ] Andrew Purtell commented on PHOENIX-3607: - We can evict from cache by LRU regardless. Though I see that will be another JIRA. As it should be. We get hadoop semantics up through HBase via User because User refers to UGI, and Phoenix shouldn't change them, and testing User equality by name instead of subject identity would be such a change. I suggested this in a private discussion but now believe it to be wrong. Allowing only one cached entry by user-name can be something Phoenix chooses to do, though won't be necessary if we expire connections out of the cache by LRU in a reasonably timely manner. Will avoid leaks that way. > Change hashCode calculation for caching ConnectionQueryServicesImpls > > > Key: PHOENIX-3607 > URL: https://issues.apache.org/jira/browse/PHOENIX-3607 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby > > PhoenixDriver maintains a cache of ConnectionInfo -> > ConnectionQueryServicesImpl (each of which holds a single HConnection) : > The hash code of ConnectionInfo in part uses the hash code of its HBase User > object, which uses the *identity hash* of the Subject allocated at login. > There are concerns about the stability of this hashcode. When we log out and > log in after TGT refresh, will we have a new Subject? > To be defensive, we should do a hash of the string returned by user.getName() > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
[ https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829028#comment-15829028 ] Geoffrey Jacoby edited comment on PHOENIX-3607 at 1/19/17 12:14 AM: The discussion in https://issues.apache.org/jira/browse/HADOOP-6670 and https://issues.apache.org/jira/browse/HADOOP-12529 gives some more context. HADOOP-6670 introduced the System.Object hashing, and HADOOP-12529 was a proposal to switch the hashing back that several developers argued against. Apparently the Hadoop developers had concerns with allowing Users with semantically equal username/subject/principal to count as equal because their respective UGI objects could have different Credentials attached. They were apparently worried that caching based on those assumptions could lead to either access denied errors or privilege escalations from clients getting connections with the wrong authentication. was (Author: gjacoby): The discussion in https://issues.apache.org/jira/browse/HADOOP-6670 and https://issues.apache.org/jira/browse/HADOOP-12529 gives some more context. HADOOP-6670 introduced the System.Object hashing, and Apparently the Hadoop developers had concerns with allowing Users with semantically equal username/subject/principal to count as equal because their respective UGI objects could have different Credentials attached. They were apparently worried that caching based on those assumptions could lead to either access denied errors or privilege escalations from clients getting connections with the wrong authentication. > Change hashCode calculation for caching ConnectionQueryServicesImpls > > > Key: PHOENIX-3607 > URL: https://issues.apache.org/jira/browse/PHOENIX-3607 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby > > PhoenixDriver maintains a cache of ConnectionInfo -> > ConnectionQueryServicesImpl (each of which holds a single HConnection) : > The hash code of ConnectionInfo in part uses the hash code of its HBase User > object, which uses the *identity hash* of the Subject allocated at login. > There are concerns about the stability of this hashcode. When we log out and > log in after TGT refresh, will we have a new Subject? > To be defensive, we should do a hash of the string returned by user.getName() > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
[ https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829028#comment-15829028 ] Geoffrey Jacoby commented on PHOENIX-3607: -- The discussion in https://issues.apache.org/jira/browse/HADOOP-6670 and https://issues.apache.org/jira/browse/HADOOP-12529 gives some more context. HADOOP-6670 introduced the System.Object hashing, and Apparently the Hadoop developers had concerns with allowing Users with semantically equal username/subject/principal to count as equal because their respective UGI objects could have different Credentials attached. They were apparently worried that caching based on those assumptions could lead to either access denied errors or privilege escalations from clients getting connections with the wrong authentication. > Change hashCode calculation for caching ConnectionQueryServicesImpls > > > Key: PHOENIX-3607 > URL: https://issues.apache.org/jira/browse/PHOENIX-3607 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby > > PhoenixDriver maintains a cache of ConnectionInfo -> > ConnectionQueryServicesImpl (each of which holds a single HConnection) : > The hash code of ConnectionInfo in part uses the hash code of its HBase User > object, which uses the *identity hash* of the Subject allocated at login. > There are concerns about the stability of this hashcode. When we log out and > log in after TGT refresh, will we have a new Subject? > To be defensive, we should do a hash of the string returned by user.getName() > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3519) Add COLUMN_ENCODED_BYTES table property
[ https://issues.apache.org/jira/browse/PHOENIX-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-3519: Attachment: PHOENIX-3519-v2.patch [~jamestaylor] Thanks for the feedback, I have attached a v2 patch with review changes. When we add a column that already exists in a child view to the base table it is propagated to child views add the existing column in the child view ordinal position might change to match that of the base table. For tables with encoded column qualifiers we don't allow users to add an existing view column to the base table. I changed the default behavior to not use encoded column qualifier in this patch (and reverted this change in PHOENIX-3586) I had to make the following change in MetadataClient.createTable as the PostDDLCompiler plan was not being run for transactional tables so stats were not available after a transnational table was created. I changed the check to be {code} if (isImmutableRows && encodingScheme != NON_ENCODED_QUALIFIERS) { // force store nulls to true so delete markers aren't used storeNulls = true; tableProps.put(PhoenixDatabaseMetaData.STORE_NULLS, Boolean.TRUE); } {code} I have added a unit test for "NONE". > Add COLUMN_ENCODED_BYTES table property > --- > > Key: PHOENIX-3519 > URL: https://issues.apache.org/jira/browse/PHOENIX-3519 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Fix For: 4.10.0 > > Attachments: PHOENIX-3519.patch, PHOENIX-3519-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Phoenix ODBC user experience
Hi There, If you are using Phoenix ODBC driver, could you please share your experience interms of stability of the set-up (since it requires Query server) and read/write performance. Performance is subjective but will give us some data points to assess the option of using ODBC drivers with C++. Thanks in advance for your feedback. - Biju
[jira] [Comment Edited] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828857#comment-15828857 ] Thomas D'Silva edited comment on PHOENIX-3586 at 1/18/17 10:11 PM: --- I have attached a v3 patch which sets the immutableStorageScheme in SingleCellColumnExpression by reading the last byte of the serialized array. I also changed the IMMUTABLE_STORAGE_SCHEME table property to be mutable. I forgot to mention that these serialization changes are b/w compatible because SingleCellColumExpression was newly added. I was going to parameterize more tests in a separate JIRA PHOENIX-3446. was (Author: tdsilva): I have attached a v3 patch which sets the immutableStorageScheme in SingleCellColumnExpression by reading the last byte of the serialized array. I also changed the IMMUTABLE_STORAGE_SCHEME table property to be mutable. I forgot to mention that these serialization changes are b/w compatible because SingleCellColumExpression was newly added. I was going to parameterize more tests in a separate JIRA PHOENIX-3519. > Add StorageScheme table property to allow users to specify their custom > storage schemes > --- > > Key: PHOENIX-3586 > URL: https://issues.apache.org/jira/browse/PHOENIX-3586 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch, > PHOENIX-3586-v3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-3586: Attachment: PHOENIX-3586-v3.patch I have attached a v3 patch which sets the immutableStorageScheme in SingleCellColumnExpression by reading the last byte of the serialized array. I also changed the IMMUTABLE_STORAGE_SCHEME table property to be mutable. I forgot to mention that these serialization changes are b/w compatible because SingleCellColumExpression was newly added. I was going to parameterize more tests in a separate JIRA PHOENIX-3519. > Add StorageScheme table property to allow users to specify their custom > storage schemes > --- > > Key: PHOENIX-3586 > URL: https://issues.apache.org/jira/browse/PHOENIX-3586 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch, > PHOENIX-3586-v3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
[ https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828822#comment-15828822 ] Geoffrey Jacoby commented on PHOENIX-3607: -- So then I think it comes down to a simple question: what should happen to the cache if a ConnectionInfo with a new-but-semantically-identical User comes along? If the correct answer is "We shouldn't reuse the existing cached CQSI because ConnectionInfos are immutable and the new User should make it count as a different ConnectionInfo", then we should create a new CQSI, but the code should synchronously make sure that any semantically-identical ConnectionInfos are explicitly expired out of the cache at that time so we don't leak. On the other hand, if the correct answer is "We should reuse the existing cached CQSI", then we should change the hashCode as proposed in this JIRA, so that the new ConnectionInfo hashes to the same value as the old one. Or, to put it another way: should we use object or semantic equality for ConnectionInfo objects? Either way, I think we should go forward with the LRU expiration of the cache. > Change hashCode calculation for caching ConnectionQueryServicesImpls > > > Key: PHOENIX-3607 > URL: https://issues.apache.org/jira/browse/PHOENIX-3607 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby > > PhoenixDriver maintains a cache of ConnectionInfo -> > ConnectionQueryServicesImpl (each of which holds a single HConnection) : > The hash code of ConnectionInfo in part uses the hash code of its HBase User > object, which uses the *identity hash* of the Subject allocated at login. > There are concerns about the stability of this hashcode. When we log out and > log in after TGT refresh, will we have a new Subject? > To be defensive, we should do a hash of the string returned by user.getName() > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
[ https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828733#comment-15828733 ] Josh Elser commented on PHOENIX-3607: - Thanks, Geoffrey. This is super helpful. bq. 1. Client requests a secure connection. The PhoenixEmbeddedDriver code logs them in, creates a ConnectionInfo, and caches, then returns the connection to the client. Can you be more specific here? Does this mean that you're using the automatic Kerberos login via the JDBC url properties? If so, PHOENIX-3232 is relevant. I'm still trying to re-wrap my head around the problem. I'm thinking that it's related to the final User field on the ConnectionInfo. I really don't think that Phoenix is managing these credentials correctly. PHOENIX-3126 was the original case which outlined this. Maybe ConnectionInfo's should only use the user-name to references the ConnectionQueryServices object, but this doesn't do anything to prevent the {{UGI.getCurrentUser()}} from not matching what is specified in the ConnectionInfo. Along these lines, a comment I left to try to explain this. {noformat} // PHOENIX-3189 Because ConnectionInfo is immutable, we must make sure all parts of it are correct before // construction; this also requires the Kerberos user credentials object (since they are compared by reference // and not by value. If the user provided a principal and keytab via the JDBC url, we must make sure that the // Kerberos login happens *before* we construct the ConnectionInfo object. Otherwise, the use of ConnectionInfo // to determine when ConnectionQueryServices impl's should be reused will be broken. {noformat} bq. 2. Under a separate JIRA, we'd like to make the ConnectionInfo/CQSI cache gradually expire entries in LRU fashion. This sounds like a good idea. Hopefully, we can do this in a way that doesn't cause clients to do anything special when the elements are evicted. Finally, sorry for leaving this mess in a bad state -- LMK how I can help. > Change hashCode calculation for caching ConnectionQueryServicesImpls > > > Key: PHOENIX-3607 > URL: https://issues.apache.org/jira/browse/PHOENIX-3607 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby > > PhoenixDriver maintains a cache of ConnectionInfo -> > ConnectionQueryServicesImpl (each of which holds a single HConnection) : > The hash code of ConnectionInfo in part uses the hash code of its HBase User > object, which uses the *identity hash* of the Subject allocated at login. > There are concerns about the stability of this hashcode. When we log out and > log in after TGT refresh, will we have a new Subject? > To be defensive, we should do a hash of the string returned by user.getName() > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3584) Expose metrics for ConnectionQueryServices instances and their allocators in the JVM
[ https://issues.apache.org/jira/browse/PHOENIX-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828715#comment-15828715 ] Sergey Soldatov commented on PHOENIX-3584: -- Can this trace logging in ConnectionQueryServicesImpl be done in other way? When spark shell started with phoenix those 2 pages of exceptions may lead to heart attack. > Expose metrics for ConnectionQueryServices instances and their allocators in > the JVM > > > Key: PHOENIX-3584 > URL: https://issues.apache.org/jira/browse/PHOENIX-3584 > Project: Phoenix > Issue Type: Improvement >Reporter: Andrew Purtell >Assignee: Samarth Jain > Fix For: 4.9.1, 4.10.0 > > Attachments: PHOENIX-3584.patch > > > In the case a client is leaking Phoenix/HBase connections it would be helpful > to have metrics available on the number of ConnectionQueryServices > (ConnectionQueryServicesImpl) instances and who has allocated them. > For the latter, we could get a stacktrace when ConnectionQueryServicesImpls > are allocated (should be a relatively rare) and keep a count by hash of the > call stack (and save the call stack). Then we need a method to dump the hash > to callstack map as a string. This method can be called remotely by JMX when > debugging leaks in a live environment. Perhaps after the count of > ConnectionQueryServicesImpls goes over a configurable threshold we can also > log warnings that dump the counts by hash and callstacks corresponding to > those hashes. > Or, we should only have multiple ConnectionQueryServicesImpls if an optional > parameter is passed in the JDBC connect string. We could keep counts by that > parameter string and dump that instead of call stacks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3519) Add COLUMN_ENCODED_BYTES table property
[ https://issues.apache.org/jira/browse/PHOENIX-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828692#comment-15828692 ] James Taylor commented on PHOENIX-3519: --- Thanks for the patch, [~tdsilva]. Here's some feedback: - Is this because we'll allocate a new slot for the new column and they won't like up based on name now? I think we should explain that. {code} +// For the non-diverged view, adding the column VIEW_COL2 will end up changing its ordinal position in the view. {code} - Why is this change needed? {code} -if (table == null || table.getType() == PTableType.VIEW || table.isTransactional()) { +if (table == null || table.getType() == PTableType.VIEW /*|| table.isTransactional()*/) { {code} - Is this only when we store everything in a single KeyValue? {code} +if (isImmutableRows) { +// force store nulls to true so delete markers aren't used +storeNulls = true; +tableProps.put(PhoenixDatabaseMetaData.STORE_NULLS, Boolean.TRUE); +} + {code} - Do we have a unit test for usage of NONE? {code} + if ("NONE".equalsIgnoreCase(strValue)) { + return 0; + } {code} > Add COLUMN_ENCODED_BYTES table property > --- > > Key: PHOENIX-3519 > URL: https://issues.apache.org/jira/browse/PHOENIX-3519 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Fix For: 4.10.0 > > Attachments: PHOENIX-3519.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
[ https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828684#comment-15828684 ] Geoffrey Jacoby commented on PHOENIX-3607: -- [~elserj] We're seeing Phoenix connections leak over time in secure clusters. One use case we're concerned about is: 1. Client requests a secure connection. The PhoenixEmbeddedDriver code logs them in, creates a ConnectionInfo, and caches, then returns the connection to the client. 2. Sometime later, the client requests a secure connection again with the same credentials. ConnectionInfo#normalize realizes that the kerberos ticket has expired, and re-logs in with a different User object and different Subject, but semantically equal to the ConnectionInfo in step 1. 3. It tries to look up a cached ConnectionQueryServicesImpl/HConnection in the cache, but it misses because the hash codes of the two ConnectionInfos are different. A new ConnectionQueryServicesImpl and HConnection are created, and put in the cache. 4. The old cache entry from step 1 will never be used or purged, and so its CQSI and HConnection leaks. The two changes we're proposing as fixes are: 1. Change the hashing function of ConnectionInfo so that semantically identical instances will return the same hash code even if their User subjects are different instances (e.g. use User.getUserName() instead of User in ConnectionInfo.hashCode()) 2. Under a separate JIRA, we'd like to make the ConnectionInfo/CQSI cache gradually expire entries in LRU fashion. [~apurtell] > Change hashCode calculation for caching ConnectionQueryServicesImpls > > > Key: PHOENIX-3607 > URL: https://issues.apache.org/jira/browse/PHOENIX-3607 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby > > PhoenixDriver maintains a cache of ConnectionInfo -> > ConnectionQueryServicesImpl (each of which holds a single HConnection) : > The hash code of ConnectionInfo in part uses the hash code of its HBase User > object, which uses the *identity hash* of the Subject allocated at login. > There are concerns about the stability of this hashcode. When we log out and > log in after TGT refresh, will we have a new Subject? > To be defensive, we should do a hash of the string returned by user.getName() > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828665#comment-15828665 ] James Taylor commented on PHOENIX-3586: --- bq. If we derive the immutableStorageScheme from the serialized array, then all future serialization formats will need to store the enum ordinal in the serialized array. It is only written and read once. We already serialize a byte for the format, right? Can this correspond to the enum ordinal? Would be nice if it was the last byte in the serialized bytes so we know where to find it. I think we need it in the data itself as if/when it changes, there'll be a mix of old and new potentially, in a row-by-row basis. WDYT? Are we thinking to parameterize more tests? Like IndexIT perhaps? > Add StorageScheme table property to allow users to specify their custom > storage schemes > --- > > Key: PHOENIX-3586 > URL: https://issues.apache.org/jira/browse/PHOENIX-3586 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
[ https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828655#comment-15828655 ] Josh Elser commented on PHOENIX-3607: - I think there were issues WRT proxy-user authentication in this case. When the end-user was riding on top of some other user's credentials (I think this was ultimately stemming from an Apache Zeppelin use-case), it was leaking connections. I also remember the convenience JDBC URL principal+keytab properties being problematic. Is there a concrete case that you can share, [~gjacoby]? It would be good to have something firm that we can show does/does-not work, and then work from there. In general, this is rather difficult to do correctly due to the caching that PhoenixDriver tries to do. The inversion of control WRT the {{Subject.doAs}} calls makes it tough for Phoenix to encapsulate this while supporting multiple users in the same JVM. > Change hashCode calculation for caching ConnectionQueryServicesImpls > > > Key: PHOENIX-3607 > URL: https://issues.apache.org/jira/browse/PHOENIX-3607 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby > > PhoenixDriver maintains a cache of ConnectionInfo -> > ConnectionQueryServicesImpl (each of which holds a single HConnection) : > The hash code of ConnectionInfo in part uses the hash code of its HBase User > object, which uses the *identity hash* of the Subject allocated at login. > There are concerns about the stability of this hashcode. When we log out and > log in after TGT refresh, will we have a new Subject? > To be defensive, we should do a hash of the string returned by user.getName() > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-3586: Attachment: PHOENIX-3586-v2.patch > Add StorageScheme table property to allow users to specify their custom > storage schemes > --- > > Key: PHOENIX-3586 > URL: https://issues.apache.org/jira/browse/PHOENIX-3586 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-3586: Attachment: (was: PHOENIX-3586-v2.patch) > Add StorageScheme table property to allow users to specify their custom > storage schemes > --- > > Key: PHOENIX-3586 > URL: https://issues.apache.org/jira/browse/PHOENIX-3586 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3586) Add StorageScheme table property to allow users to specify their custom storage schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-3586: Attachment: PHOENIX-3586-v2.patch [~jamestaylor] Thanks for the review, I have attached a v2 patch. I refactored the ImmutableStorageScheme enum, previously the was array data type builder was an instance variable which was not thread safe. Now the enum implements the following interface {code} +interface ColumnValueEncoderDecoderSupplier { +ColumnValueEncoder getEncoder(int numElements); +ColumnValueDecoder getDecoder(); } +public enum ImmutableStorageScheme implements ColumnValueEncoderDecoderSupplier { +ONE_CELL_PER_COLUMN((byte)1) { +@Override +public ColumnValueEncoder getEncoder(int numElements) { +throw new UnsupportedOperationException(); +} + +@Override +public ColumnValueDecoder getDecoder() { +throw new UnsupportedOperationException(); +} +}, +// stores a single cell per column family that contains all serialized column values +SINGLE_CELL_ARRAY_WITH_OFFSETS((byte)2) { +@Override +public ColumnValueEncoder getEncoder(int numElements) { +PDataType type = PVarbinary.INSTANCE; +int estimatedSize = PArrayDataType.estimateSize(numElements, type); +TrustedByteArrayOutputStream byteStream = new TrustedByteArrayOutputStream(estimatedSize); +DataOutputStream oStream = new DataOutputStream(byteStream); +return new PArrayDataTypeEncoder(byteStream, oStream, numElements, type, SortOrder.ASC, false, PArrayDataType.IMMUTABLE_SERIALIZATION_VERSION); +} + +@Override +public ColumnValueDecoder getDecoder() { +return new PArrayDataTypeDecoder(); +} +}; {code} Many of the test changes were because in PHOENIX-3519 I changed the default to not use encoded column qualifiers and tSINGLE_CELL_ARRAY_WITH_OFFSETS storage scheme. I reverted the change in this patch. This caused changes to AlterMultiTenantTableWithViewsIT, StatsCollectorIT and MutationStateTest. The following tests were parameterized : AlterTableIT, AlterTableWithViewsIT, StoreNullsIT, StatsCollectorIT The following tests changed because I renamed StorageScheme to ImmutableStorageScheme : CorrelatePlanTest and LiteralResultIteratorPlanTest I refactored ArrayConstructorExpressionTest and moved some other the tests into a new test ImmutableStorageSchemeTest. I renamed PArrayDataTypeBytesArrayBuilder to PArrayDataTypeEncoder and so PDataTypeForArraysTest changed. [~samarthjain] is going to add tests for different # of bytes for the column qualifier. COLUMN_ENCODED_BYTES does have a "none" option in the patch I attached to PHOENIX-3519. If we derive the immutableStorageScheme from the serialized array, then all future serialization formats will need to store the enum ordinal in the serialized array. It is only written and read once. WritableUtils.writeEnum serializes the enum using the name, in the v2 patch I modified the code to use the enum ordinal. Samarth handled adding the IMMUTABLE_STORAGE_SCHEME column during the schema upgrade in PHOENIX-3447. > Add StorageScheme table property to allow users to specify their custom > storage schemes > --- > > Key: PHOENIX-3586 > URL: https://issues.apache.org/jira/browse/PHOENIX-3586 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: PHOENIX-3586.patch, PHOENIX-3586-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
[ https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828553#comment-15828553 ] James Taylor commented on PHOENIX-3607: --- [~elserj] - remember PHOENIX-3189 and the discussion we had on putting User in the hasCode of ConnectionInfo [1][2]? It's causing issues. Can you remind us why this was done? It seems that using user.getName() instead would not be good for PQS? [1] https://github.com/apache/phoenix/pull/191#issuecomment-242566530 [2] https://github.com/apache/phoenix/pull/191#issuecomment-243230922 > Change hashCode calculation for caching ConnectionQueryServicesImpls > > > Key: PHOENIX-3607 > URL: https://issues.apache.org/jira/browse/PHOENIX-3607 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0, 4.9.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby > > PhoenixDriver maintains a cache of ConnectionInfo -> > ConnectionQueryServicesImpl (each of which holds a single HConnection) : > The hash code of ConnectionInfo in part uses the hash code of its HBase User > object, which uses the *identity hash* of the Subject allocated at login. > There are concerns about the stability of this hashcode. When we log out and > log in after TGT refresh, will we have a new Subject? > To be defensive, we should do a hash of the string returned by user.getName() > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls
Geoffrey Jacoby created PHOENIX-3607: Summary: Change hashCode calculation for caching ConnectionQueryServicesImpls Key: PHOENIX-3607 URL: https://issues.apache.org/jira/browse/PHOENIX-3607 Project: Phoenix Issue Type: Bug Affects Versions: 4.8.0, 4.9.0 Reporter: Geoffrey Jacoby Assignee: Geoffrey Jacoby PhoenixDriver maintains a cache of ConnectionInfo -> ConnectionQueryServicesImpl (each of which holds a single HConnection) : The hash code of ConnectionInfo in part uses the hash code of its HBase User object, which uses the *identity hash* of the Subject allocated at login. There are concerns about the stability of this hashcode. When we log out and log in after TGT refresh, will we have a new Subject? To be defensive, we should do a hash of the string returned by user.getName() instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3600) Core MapReduce classes don't provide location info
[ https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828412#comment-15828412 ] Hadoop QA commented on PHOENIX-3600: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12848082/PHOENIX-3600.patch against master branch at commit a675211909415ca376e432d25f8a8822aadf5712. ATTACHMENT ID: 12848082 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +private List generateSplits(final QueryPlan qplan, final List splits, Configuration config) throws IOException { +org.apache.hadoop.hbase.client.Connection connection = ConnectionFactory.createConnection(config); +psplits.add(new PhoenixInputSplit(Lists.newArrayList(aScan), regionSize, regionLocation)); {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/733//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/733//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/733//console This message is automatically generated. > Core MapReduce classes don't provide location info > -- > > Key: PHOENIX-3600 > URL: https://issues.apache.org/jira/browse/PHOENIX-3600 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin > Attachments: PHOENIX-3600.patch > > > The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} > and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region > size or location information, leaving the execution engine (MR, Spark, etc.) > to randomly assign splits to nodes. > Interestingly, the phoenix-hive module has reimplemented these classes, > including the node-aware functionality. We should port a subset of those > changes back to the core code so that other engines can make use of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
ApacheCon CFP closing soon (11 February)
Hello, fellow Apache enthusiast. Thanks for your participation, and interest in, the projects of the Apache Software Foundation. I wanted to remind you that the Call For Papers (CFP) for ApacheCon North America, and Apache: Big Data North America, closes in less than a month. If you've been putting it off because there was lots of time left, it's time to dig for that inspiration and get those talk proposals in. It's also time to discuss with your developer and user community whether there's a track of talks that you might want to propose, so that you have more complete coverage of your project than a talk or two. We're looking for talks directly, and indirectly, related to projects at the Apache Software Foundation. These can be anything from in-depth technical discussions of the projects you work with, to talks about community, documentation, legal issues, marketing, and so on. We're also very interested in talks about projects and services built on top of Apache projects, and case studies of how you use Apache projects to solve real-world problems. We are particularly interested in presentations from Apache projects either in the Incubator, or recently graduated. ApacheCon is where people come to find out what technology they'll be using this time next year. Important URLs are: To submit a talk for Apache: Big Data - http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp To submit a talk for ApacheCon - http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp To register for Apache: Big Data - http://events.linuxfoundation.org/events/apache-big-data-north-america/attend/register- To register for ApacheCon - http://events.linuxfoundation.org/events/apachecon-north-america/attend/register- Early Bird registration rates end March 12th, but if you're a committer on an Apache project, you get the low committer rate, which is less than half of the early bird rate! For further updated about ApacheCon, follow us on Twitter, @ApacheCon, or drop by our IRC channel, #apachecon on the Freenode IRC network. Or contact me - rbo...@apache.org - with any questions or concerns. Thanks! Rich Bowen, VP Conferences, Apache Software Foundation -- (You've received this email because you're on a dev@ or users@ mailing list of an Apache Software Foundation project. For subscription and unsubscription information, consult the headers of this email message, as this varies from one list to another.)
[jira] [Commented] (PHOENIX-3601) PhoenixRDD doesn't expose the preferred node locations to Spark
[ https://issues.apache.org/jira/browse/PHOENIX-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828357#comment-15828357 ] Josh Mahonin commented on PHOENIX-3601: --- Trivial patch, most of the functionality comes from PHOENIX-3600. Unfortunately since PhoenixRDD extends 'RDD' and not 'NewHadoopRDD' we don't get some of the niceties for free. There was a good reason for this that's now lost to me... TL;DR: If used in conjunction with PHOENIX-3600, I observed Spark data load times decrease by 30-40%. Longer version: Using [~elserj]'s take on the very cool https://github.com/joshelser/phoenix-performance toolset, I generated about 114M rows of TPC-DS data on a 5 RegionServer setup. I used a load-factor of 5, which created a 256-way split table we'll refer to as SALES. I also created a new table, pre-salted with 5 buckets we'll call SALES2 and UPSERT SELECTed the data over. Both tables had major compaction and UPDATE STATISTICS run on them as well. Using HDP 2.5 (Phoenix 4.7, Spark 1.6), I invoked spark-shell with 5 executors and 2 cores each. Each executor was co-located with one Region Server. I then created a Phoenix RDD for each table, and then ran a Spark {{rdd.count}} operation on them. This effectively loads the entire table into Spark, and then Spark counts the rows. I ran this for each table, using the default case, then just the location changes, then the location changes plus the split.by.stats changes and recorded the run-times 4 times each. I also closed out the spark-shell and ensured any Spark-cached files were removed, although I didn't account for caching on the HBase or OS side. ||SALES (256 regions, 261 stats splits)||t1||t2||t3||t4|| |control|120s|116s|111s|125s| |location|96s|106s|94s|100s| |location+stats|82s|74s|82s|82s| ||SALES2 (10 regions, 50 stats splits) ||t1||t2||t3||t4|| |control|102s|83s|92s|96s| |location|94s|78s|90s|81s| |location+stats|62s|70s|79s|58s| I have more screencaps of the Spark executors that report on the various task jobs, but in short, what we see is the individual task times are much more evenly distributed (i.e. fewer outliers), and the overall task time is also decreased due to less network overhead. If anyone's using phoenix-spark and is able to test it out, that would be great. Also cc [~maghamraviki...@gmail.com] [~ndimiduk] [~sergey.soldatov] [~elserj] [~jamestaylor] > PhoenixRDD doesn't expose the preferred node locations to Spark > --- > > Key: PHOENIX-3601 > URL: https://issues.apache.org/jira/browse/PHOENIX-3601 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin > Attachments: PHOENIX-3601.patch > > > Follow-up to PHOENIX-3600, in order to let Spark know the preferred node > locations to assign partitions to, we need to update PhoenixRDD to retrieve > the underlying node location information from the splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3601) PhoenixRDD doesn't expose the preferred node locations to Spark
[ https://issues.apache.org/jira/browse/PHOENIX-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Mahonin updated PHOENIX-3601: -- Attachment: PHOENIX-3601.patch > PhoenixRDD doesn't expose the preferred node locations to Spark > --- > > Key: PHOENIX-3601 > URL: https://issues.apache.org/jira/browse/PHOENIX-3601 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin > Attachments: PHOENIX-3601.patch > > > Follow-up to PHOENIX-3600, in order to let Spark know the preferred node > locations to assign partitions to, we need to update PhoenixRDD to retrieve > the underlying node location information from the splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3600) Core MapReduce classes don't provide location info
[ https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828307#comment-15828307 ] Josh Mahonin commented on PHOENIX-3600: --- Mostly just ported the Phoenix/MR specific code from: https://github.com/apache/phoenix/blob/master/phoenix-hive/src/main/java/org/apache/phoenix/hive/mapreduce/PhoenixInputFormat.java#L151-L216 https://github.com/apache/phoenix/blob/master/phoenix-hive/src/main/java/org/apache/phoenix/hive/mapreduce/PhoenixInputSplit.java Also included a new Configuration property "phoenix.mapreduce.split.by.stats" which does effectively the same thing as the Hive-specific "split.by.stats". In short, the MR code was only generating InputSplits based on Region Splits, and wasn't taking into account the possibility of more scans being generated by the statistics collection. I'll follow-up on PHOENIX-3601 with the performance results I gathered in some Spark testing, but it would be awesome if other folks using Phoenix MR integration in some capacity could test this out. It's a bit of a double-whammy, since this patch gets us both node-awareness for the splits, and increases the potential parallelism by including the statistics-generated scans. cc [~maghamraviki...@gmail.com] [~ndimiduk] [~sergey.soldatov] [~elserj] [~jamestaylor], perhaps others :) > Core MapReduce classes don't provide location info > -- > > Key: PHOENIX-3600 > URL: https://issues.apache.org/jira/browse/PHOENIX-3600 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin > Attachments: PHOENIX-3600.patch > > > The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} > and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region > size or location information, leaving the execution engine (MR, Spark, etc.) > to randomly assign splits to nodes. > Interestingly, the phoenix-hive module has reimplemented these classes, > including the node-aware functionality. We should port a subset of those > changes back to the core code so that other engines can make use of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3600) Core MapReduce classes don't provide location info
[ https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Mahonin updated PHOENIX-3600: -- Attachment: PHOENIX-3600.patch > Core MapReduce classes don't provide location info > -- > > Key: PHOENIX-3600 > URL: https://issues.apache.org/jira/browse/PHOENIX-3600 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin > Attachments: PHOENIX-3600.patch > > > The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} > and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region > size or location information, leaving the execution engine (MR, Spark, etc.) > to randomly assign splits to nodes. > Interestingly, the phoenix-hive module has reimplemented these classes, > including the node-aware functionality. We should port a subset of those > changes back to the core code so that other engines can make use of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3336) get the wrong results when using the local secondary index
[ https://issues.apache.org/jira/browse/PHOENIX-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Houliang Qi updated PHOENIX-3336: - Description: When using phoenix local secondary index, two clients concurrent upsert data to the same row key, while the other client using the index column to retrieve data, it gets the wrong results. Just like the attachments, I create one table called orders_5, and create one local index on table orders_5, called clerk_5; then I use two clients to upsert data to the same row key of table orders_5, and you will see that, the local index clerk_5 have some stale record (maybe its OK for eventual consistency), however, when you use the previous value to retrieve the record, you can still get the result, even more serious, the result is wrong, namely it not the record which you have insert before, and also not the record in the primary table(in the case ,is the table orders_5) was: When using phoenix local secondary index, two clients concurrent upsert data to the same row key, while using the index column to retrieve data, it gets the wrong results. Just like the attachments, I create one table called orders_5, and create one local index on table orders_5, called clerk_5; then I use two clients to upsert data to the same row key of table orders_5, and you will see that, the local index clerk_5 have some stale record (maybe its OK for eventual consistency), however, when you use the previous value to retrieve the record, you can still get the result, even more serious, the result is wrong, namely it not the record which you have insert before, and also not the record in the primary table(in the case ,is the table orders_5) > get the wrong results when using the local secondary index > -- > > Key: PHOENIX-3336 > URL: https://issues.apache.org/jira/browse/PHOENIX-3336 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 > Environment: hbase-1.1.2 >Reporter: Houliang Qi > Labels: phoenix, secondaryIndex > Attachments: create_table_orders.sql, readme.txt, sample_1.csv, > sample_2.csv, wrong-index-2.png > > Original Estimate: 120h > Remaining Estimate: 120h > > When using phoenix local secondary index, two clients concurrent upsert data > to the same row key, while the other client using the index column to > retrieve data, it gets the wrong results. > Just like the attachments, I create one table called orders_5, and create one > local index on table orders_5, called clerk_5; then I use two clients to > upsert data to the same row key of table orders_5, and you will see that, > the local index clerk_5 have some stale record (maybe its OK for eventual > consistency), however, when you use the previous value to retrieve the > record, you can still get the result, even more serious, the result is wrong, > namely it not the record which you have insert before, and also not the > record in the primary table(in the case ,is the table orders_5) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3134) varbinary fields bulk load difference between MR/psql and upserts
[ https://issues.apache.org/jira/browse/PHOENIX-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827706#comment-15827706 ] Hudson commented on PHOENIX-3134: - SUCCESS: Integrated in Jenkins build Phoenix-master #1535 (See [https://builds.apache.org/job/Phoenix-master/1535/]) PHOENIX-3134 varbinary fields bulk load difference between MR/psql and (ankitsinghal59: rev a675211909415ca376e432d25f8a8822aadf5712) * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/util/csv/CsvUpsertExecutorTest.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvBulkLoadTool.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/types/PBinary.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/CsvBulkImportUtil.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/util/AbstractUpsertExecutorTest.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/util/PhoenixRuntime.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/util/json/JsonUpsertExecutor.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/util/json/JsonUpsertExecutorTest.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/schema/types/PVarbinary.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/util/csv/CsvUpsertExecutor.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/expression/function/EncodeFormat.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/mapreduce/CsvBulkImportUtilTest.java > varbinary fields bulk load difference between MR/psql and upserts > - > > Key: PHOENIX-3134 > URL: https://issues.apache.org/jira/browse/PHOENIX-3134 > Project: Phoenix > Issue Type: Improvement >Reporter: Sergey Soldatov >Assignee: Ankit Singhal > Fix For: 4.10.0 > > Attachments: PHOENIX-3134.patch, PHOENIX-3134_v1.patch > > > At the moment we have strange difference between how MR/psql upload and > upsert handles varbinary. MR/ psql expects that it's base64 encoded whereas > upsert takes input as a string. Should we add an option to load it as a plain > data or base64 in MR/psql? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-3548) Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py
[ https://issues.apache.org/jira/browse/PHOENIX-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal resolved PHOENIX-3548. Resolution: Fixed committed as part of PHOENIX-3134. > Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py > -- > > Key: PHOENIX-3548 > URL: https://issues.apache.org/jira/browse/PHOENIX-3548 > Project: Phoenix > Issue Type: Sub-task >Reporter: Ankit Singhal >Assignee: Ankit Singhal > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-3548) Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py
[ https://issues.apache.org/jira/browse/PHOENIX-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Singhal updated PHOENIX-3548: --- Fix Version/s: 4.10.0 > Create documentation for binaryEncoding option for CSVBulkLoad/pSQL.py > -- > > Key: PHOENIX-3548 > URL: https://issues.apache.org/jira/browse/PHOENIX-3548 > Project: Phoenix > Issue Type: Sub-task >Reporter: Ankit Singhal >Assignee: Ankit Singhal > Fix For: 4.10.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)