[jira] [Updated] (PHOENIX-953) Support UNNEST for ARRAY

2015-08-19 Thread Dumindu Buddhika (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dumindu Buddhika updated PHOENIX-953:
-
Attachment: PHOENIX-953-v2.patch

[~maryannxue] Thanks for the feedback!.
Hi [~jamestaylor], [~maryannxue] This patch includes UnnestArrayQueryPlan and 
UnnestArrayResultInterator changed according to feedback. 

Here we have changed DerivedTableNode to test an Unnest query to see how 
UnnestArrayQueryPlan works. When doing that there was a problem to change the 
type of the array column to base type of the array. This is solved here. But it 
looks hacky. 

> Support UNNEST for ARRAY
> 
>
> Key: PHOENIX-953
> URL: https://issues.apache.org/jira/browse/PHOENIX-953
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>Assignee: Dumindu Buddhika
> Attachments: PHOENIX-953-v1.patch, PHOENIX-953-v2.patch
>
>
> The UNNEST built-in function converts an array into a set of rows. This is 
> more than a built-in function, so should be considered an advanced project.
> For an example, see the following Postgres documentation: 
> http://www.postgresql.org/docs/8.4/static/functions-array.html
> http://www.anicehumble.com/2011/07/postgresql-unnest-function-do-many.html
> http://tech.valgog.com/2010/05/merging-and-manipulating-arrays-in.html
> So the UNNEST is a way of converting an array to a flattened "table" which 
> can then be filtered on, ordered, grouped, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1118) Provide a tool for visualizing Phoenix tracing information

2015-08-19 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-1118:
--
Issue Type: New Feature  (was: Sub-task)
Parent: (was: PHOENIX-1121)

> Provide a tool for visualizing Phoenix tracing information
> --
>
> Key: PHOENIX-1118
> URL: https://issues.apache.org/jira/browse/PHOENIX-1118
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: James Taylor
>Assignee: Nishani 
>  Labels: Java, SQL, Visualization, gsoc2015, mentor
> Attachments: MockUp1-TimeSlider.png, MockUp2-AdvanceSearch.png, 
> MockUp3-PatternDetector.png, MockUp4-FlameGraph.png, Screenshot of dependency 
> tree.png, Screenshot-loading-trace-list.png, m1-mockUI-tracedistribution.png, 
> m1-mockUI-tracetimeline.png, screenshot of tracing timeline.png, screenshot 
> of tracing web app.png, timeline.png
>
>
> Currently there's no means of visualizing the trace information provided by 
> Phoenix. We should provide some simple charting over our metrics tables. Take 
> a look at the following JIRA for sample queries: 
> https://issues.apache.org/jira/browse/PHOENIX-1115?focusedCommentId=14323151&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14323151



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2163) Measure performance of Phoenix/Calcite querying

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703271#comment-14703271
 ] 

James Taylor commented on PHOENIX-2163:
---

[~shuxi0ng] - how are we looking to run the perf regression test for the 
calcite branch versus the head of the 4.5-HBase-VERSION (where VERSION matches 
the HBase version you're using on the calcite branch)? Let [~mujtabachohan] 
know if you have any questions on how to run it.

> Measure performance of Phoenix/Calcite querying
> ---
>
> Key: PHOENIX-2163
> URL: https://issues.apache.org/jira/browse/PHOENIX-2163
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Shuxiong Ye
> Attachments: PHOENIX-2163.patch
>
>
> The work to integrate Phoenix with Calcite has come along far enough that 
> queries both against the data table and through a secondary index is 
> functional. As a checkpoint, we should compare performance of as many queries 
> as possible in our regression suite for the calcite branch against the latest 
> Phoenix release (4.5.0). The runtime of these two systems should be the same, 
> so this will give us an idea of the overhead of query parsing and compilation 
> for Calcite. This is super important, as it'll identify outstanding work 
> that'll be necessary to do prior to any releases on top of this new stack.
> Source code of regression suite is at 
> https://github.com/mujtabachohan/PhoenixRegressor
> Connection string location: 
> https://github.com/mujtabachohan/PhoenixRegressor/blob/master/src/main/resources/settings.json
> Instructions on how to compile and run: 
> https://github.com/mujtabachohan/PhoenixRegressor/blob/master/README.md



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2186) Creating backend services for the tracing web app

2015-08-19 Thread Nishani (JIRA)
Nishani  created PHOENIX-2186:
-

 Summary: Creating backend services for the tracing web app
 Key: PHOENIX-2186
 URL: https://issues.apache.org/jira/browse/PHOENIX-2186
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Nishani 
Assignee: Nishani 


This will include the following components.
Main class 
Pom file
Launch script 
Backend trace service API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2187) Creating the front-end application for Phoenix Tracing Web App

2015-08-19 Thread Nishani (JIRA)
Nishani  created PHOENIX-2187:
-

 Summary: Creating the front-end application for Phoenix Tracing 
Web App
 Key: PHOENIX-2187
 URL: https://issues.apache.org/jira/browse/PHOENIX-2187
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Nishani 
Assignee: Nishani 


This will include the following tracing visualization features.
List - lists the traces with their attributes
Trace Count - Chart view over the trace description
Dependency Tree - tree view of  trace ids
Timeline - timeline of trace ids
Trace Distribution - Distribution chart of hosts of traces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-953) Support UNNEST for ARRAY

2015-08-19 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703555#comment-14703555
 ] 

Maryann Xue commented on PHOENIX-953:
-

Thanks for updating the patch, [~Dumindux]! Actually your original patch (v1) 
was already good enough for the runtime part, except that you should return a 
new tuple for each array element. So the first two steps in patch v1 were 
perfectly fine: 1) get the tuple from the "delegate"; 2) retrieve the array 
value using standard expression.evaluate() interface. I don't see why we should 
copy that ProjectedColumnExpression logic into the iterator, and since the 
array expression is included in the UnnestExpression this step can be removed.
So the iterator should be something like:
{code}
+
+public class UnnestArrayResultIterator implements ResultIterator {
+private TupleProjector projector;
+private Tuple current;
+
+public UnnestArrayResultIterator(ResultIterator iterator) {
+this.delegate = iterator;
+this.projector = new TupleProjector(new Expression[] 
{unnestExpression}); // TupleProjector should provide a new constructor that 
deducts the KeyValueSchema based on the input expressions.
+}
+
+@Override
+public Tuple next() throws SQLException {
+if (closed)
+return;
+while (unnestExpression.isEnd()) {
+current = delegate.next();
+if (current == null) {
+this.closed = true;
+return null;
+}
+// this is to check empty arrays.
+if (unnestExpression.evaluate(current)) {
+unnestExpression.rewind();
+}
+}
+return projector.projectResults(current);
+}
+
+// other functions
+}
{code}

And the UnnestExpression should be like:
{code}
+
+public class UnnestExpression extends BaseCompoundExpression {
+private int index = -1;
+private int length = -1;
+
+// constructors
+
+public isEnd() {
+return index >= length;
+}
+
+public rewind() {
+index = 0;
+}
+
+@Override
+public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
+if (isEnd()) {
+if (!children.get(0).evaluate(tuple, ptr))
+return false;
+length = PArrayDataType.getArrayLength(ptr, getDataType(), 
getMaxLength());
+index = 0;
+}
+
+if (isEnd())
+return false;
+
+PArrayDataType.positionAtArrayElement(ptr, index++, getDataType(), 
getMaxLength());
+return true;
+}
+
+// other functions
{code}

The compiler support for UNNEST will be done by calcite in the Phoenix/Calcite 
model, so you can remove all other changes from the compiler (except the 
parser, am I right, [~jamestaylor]?). As to the test cases, we need two types 
of tests:
1. The end-to-end query tests, which will be "ignored" for now.
2. The UnnestArrayQueryPlan unit tests to test the behavior of query plan and 
result iterator only. 

> Support UNNEST for ARRAY
> 
>
> Key: PHOENIX-953
> URL: https://issues.apache.org/jira/browse/PHOENIX-953
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>Assignee: Dumindu Buddhika
> Attachments: PHOENIX-953-v1.patch, PHOENIX-953-v2.patch
>
>
> The UNNEST built-in function converts an array into a set of rows. This is 
> more than a built-in function, so should be considered an advanced project.
> For an example, see the following Postgres documentation: 
> http://www.postgresql.org/docs/8.4/static/functions-array.html
> http://www.anicehumble.com/2011/07/postgresql-unnest-function-do-many.html
> http://tech.valgog.com/2010/05/merging-and-manipulating-arrays-in.html
> So the UNNEST is a way of converting an array to a flattened "table" which 
> can then be filtered on, ordered, grouped, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-953) Support UNNEST for ARRAY

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703587#comment-14703587
 ] 

James Taylor commented on PHOENIX-953:
--

Thanks so much for the patch, [~Dumindux]. Yes, that sounds right, 
[~maryannxue]. It'd be good, [~Dumindux], if you could run your end-to-end 
tests against hsqldb. I'm particular interested in seeing a test that joins on 
the ordinal for the parallel array use case. Also, perhaps you can commit your 
tests on the calcite branch so we can test it end-to-end as well?

> Support UNNEST for ARRAY
> 
>
> Key: PHOENIX-953
> URL: https://issues.apache.org/jira/browse/PHOENIX-953
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>Assignee: Dumindu Buddhika
> Attachments: PHOENIX-953-v1.patch, PHOENIX-953-v2.patch
>
>
> The UNNEST built-in function converts an array into a set of rows. This is 
> more than a built-in function, so should be considered an advanced project.
> For an example, see the following Postgres documentation: 
> http://www.postgresql.org/docs/8.4/static/functions-array.html
> http://www.anicehumble.com/2011/07/postgresql-unnest-function-do-many.html
> http://tech.valgog.com/2010/05/merging-and-manipulating-arrays-in.html
> So the UNNEST is a way of converting an array to a flattened "table" which 
> can then be filtered on, ordered, grouped, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703608#comment-14703608
 ] 

James Taylor commented on PHOENIX-2154:
---

Thanks for the patch, [~maghamravikiran]. It's my understanding (from 
[~lhofhansl]), that if you use this command:
{code}
TableMapReduceUtil.initTableReducerJob(logicalIndexTable, null, 
job);
{code}
that the same context.write(outputKey, kv) we do will work, but the MR 
framework will issue the required batched mutation for the KVs we write through 
direct HBase calls. Is that not the case?

If that works, then I think the code changes will be much less. Not sure what 
controls the amount of batching the the HBase calls will do.

> Failure of one mapper should not affect other mappers in MR index build
> ---
>
> Key: PHOENIX-2154
> URL: https://issues.apache.org/jira/browse/PHOENIX-2154
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: IndexTool.java, PHOENIX-2154-WIP.patch
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2149) MAX Value of Sequences not honored when closing Connection between calls to NEXT VALUE FOR

2015-08-19 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703718#comment-14703718
 ] 

Samarth Jain commented on PHOENIX-2149:
---

[~tdsilva] - was this committed to 4.x branches too? If yes, then hopefully 
this fix was part of our 4.5.1 patch release too. Can you confirm?

> MAX Value of Sequences not honored when closing Connection between calls to 
> NEXT VALUE FOR
> --
>
> Key: PHOENIX-2149
> URL: https://issues.apache.org/jira/browse/PHOENIX-2149
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.4.0
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Fix For: 4.5.1
>
> Attachments: PHOENIX-2149-v2.patch, PHOENIX-2149.patch
>
>
> There appears to be an issue be related to closing connections between calls 
> to NEXT VALUE FOR that causes the MAX sequence value to be ignored. I have 
> found scenarios when I am allocating sequences near the MAX whereby the MAX 
> is not honored and value greater than the max are returned by NEXT VALUE FOR.
> It appears to be related to the logic to return all sequences on connection 
> close. It looks like if you close the connection between each invocation when 
> you hit the max value instead of the expected error being thrown sequence 
> values continue to be doled out. It looks like for some reason the 
> limit_reached_flag is not being set correctly on the SYSTEM.SEQUENCE table 
> for the sequence in this case.
> I added the test below to SequenceBulkAllocationIT that repros the issue.
> If I either a) remove the nextConnection() call that keeps recycling 
> connections in the test below or b) comment our the code in 
> PhoenixConnection.close() that calls services.removeConnection() the test 
> below starts to pass.
> I wasn't able to repro in Squirrel because I guess it doesn't recycle 
> connections.
> {code}
> @Test
> public void testNextValuesForSequenceClosingConnections() throws 
> Exception {
> final SequenceProperties props =
> new 
> SequenceProperties.Builder().incrementBy(1).startsWith(4990).cacheSize(10).minValue(4990).maxValue(5000)
> .numAllocated(4989).build();
> 
> // Create Sequence
> nextConnection();
> createSequenceWithMinMax(props);
> nextConnection();
> 
> // Try and get next value
> try {
> long val = 0L;
> for (int i = 0; i <= 11; i++) {
> ResultSet rs = 
> conn.createStatement().executeQuery(String.format(SELECT_NEXT_VALUE_SQL, 
> "bulkalloc.alpha"));
> rs.next();
> val = rs.getLong(1);
> nextConnection();
> }
> fail("Expect to fail as this value is greater than seq max " + 
> val);
> } catch (SQLException e) {
> 
> assertEquals(SQLExceptionCode.SEQUENCE_VAL_REACHED_MAX_VALUE.getErrorCode(),
> e.getErrorCode());
> assertTrue(e.getNextException() == null);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2149) MAX Value of Sequences not honored when closing Connection between calls to NEXT VALUE FOR

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703721#comment-14703721
 ] 

James Taylor commented on PHOENIX-2149:
---

I cherry-picked this fix to the 4.5 branches 
(https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h=3f03ced5392ecc324affcbdf3a51d2c4ecc18126)
 so it's part of the 4.5.1 release.

It looks like it was committed to the 4.x branches already (but FYI, that 
doesn't make it part of any patch release, that would make it part of the next 
minor release, 4.6.0).

> MAX Value of Sequences not honored when closing Connection between calls to 
> NEXT VALUE FOR
> --
>
> Key: PHOENIX-2149
> URL: https://issues.apache.org/jira/browse/PHOENIX-2149
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.4.0
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Fix For: 4.5.1
>
> Attachments: PHOENIX-2149-v2.patch, PHOENIX-2149.patch
>
>
> There appears to be an issue be related to closing connections between calls 
> to NEXT VALUE FOR that causes the MAX sequence value to be ignored. I have 
> found scenarios when I am allocating sequences near the MAX whereby the MAX 
> is not honored and value greater than the max are returned by NEXT VALUE FOR.
> It appears to be related to the logic to return all sequences on connection 
> close. It looks like if you close the connection between each invocation when 
> you hit the max value instead of the expected error being thrown sequence 
> values continue to be doled out. It looks like for some reason the 
> limit_reached_flag is not being set correctly on the SYSTEM.SEQUENCE table 
> for the sequence in this case.
> I added the test below to SequenceBulkAllocationIT that repros the issue.
> If I either a) remove the nextConnection() call that keeps recycling 
> connections in the test below or b) comment our the code in 
> PhoenixConnection.close() that calls services.removeConnection() the test 
> below starts to pass.
> I wasn't able to repro in Squirrel because I guess it doesn't recycle 
> connections.
> {code}
> @Test
> public void testNextValuesForSequenceClosingConnections() throws 
> Exception {
> final SequenceProperties props =
> new 
> SequenceProperties.Builder().incrementBy(1).startsWith(4990).cacheSize(10).minValue(4990).maxValue(5000)
> .numAllocated(4989).build();
> 
> // Create Sequence
> nextConnection();
> createSequenceWithMinMax(props);
> nextConnection();
> 
> // Try and get next value
> try {
> long val = 0L;
> for (int i = 0; i <= 11; i++) {
> ResultSet rs = 
> conn.createStatement().executeQuery(String.format(SELECT_NEXT_VALUE_SQL, 
> "bulkalloc.alpha"));
> rs.next();
> val = rs.getLong(1);
> nextConnection();
> }
> fail("Expect to fail as this value is greater than seq max " + 
> val);
> } catch (SQLException e) {
> 
> assertEquals(SQLExceptionCode.SEQUENCE_VAL_REACHED_MAX_VALUE.getErrorCode(),
> e.getErrorCode());
> assertTrue(e.getNextException() == null);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1812) Only sync table metadata when necessary

2015-08-19 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703748#comment-14703748
 ] 

Thomas D'Silva commented on PHOENIX-1812:
-

[~jamestaylor]

If we scale the txn read pointer down by 1 million, do we have to worry about 
multiple transactions occurring in one millisecond? If one of them changes the 
table metadata, will a later transaction be able to view the change (or will it 
only view the table metadata as of the start)? 

In MetaDataClient.updateCache() I used connection.getResolvedTime(result) as 
the resolved time while adding a table or while updating the resolved time 
stamp since this will set the timestamp to the mutation time for non txn and 
scn cases.

{code}
public long getResolvedTime(MetaDataMutationResult result) {
PTable table = result.getTable();
Transaction transaction = mutationState.getTransaction();
return (scn!=null && table!=null && scn > table.getTimeStamp()) ? scn : 
transaction == null ? result.getMutationTime() : transaction.getReadPointer();
}
{code}

If we don't pass the resolved time to addIndexesFromPhysicalTable it will just 
call getResolvedTimestamp from updateCache and this would be exactly the same 
as passing the resolved time, right?


In getTableStats() should I just used the new getResolvedTimestamp()

{code}
if (isSharedIndex) {
return 
connection.getQueryServices().getTableStats(table.getPhysicalName().getBytes(), 
getResolvedTimestamp());
}
{code}

> Only sync table metadata when necessary
> ---
>
> Key: PHOENIX-1812
> URL: https://issues.apache.org/jira/browse/PHOENIX-1812
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-1812-v2.patch, PHOENIX-1812-v3.patch, 
> PHOENIX-1812-v4-WIP.patch, PHOENIX-1812-v5.patch, PHOENIX-1812.patch, 
> PHOENIX-1812.patch, PHOENIX-1812.patch
>
>
> With transactions, we hold the timestamp at the point when the transaction 
> was opened. We can prevent the MetaDataEndpoint getTable RPC in 
> MetaDataClient.updateCache() to check that the client has the latest table if 
> we've already checked at the current transaction ID timestamp. We can keep 
> track of which tables we've already updated in PhoenixConnection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] phoenix pull request: PHOENIX-2023 Build tar.gz only on release pr...

2015-08-19 Thread ndimiduk
Github user ndimiduk commented on the pull request:

https://github.com/apache/phoenix/pull/110#issuecomment-132789228
  
I haven't done a phoenix release, but this looks like what I had in mind. 
Thanks for fixing the missing headers while you're in there -- these should be 
caught by the apache:rat plugin; i'm not sure why they're not.

@JamesRTaylor and @chrajeshbabu have done phoenix releases recently. Does 
this look good by you guys?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (PHOENIX-2023) Build tgz only on release profile

2015-08-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703764#comment-14703764
 ] 

ASF GitHub Bot commented on PHOENIX-2023:
-

Github user ndimiduk commented on the pull request:

https://github.com/apache/phoenix/pull/110#issuecomment-132789228
  
I haven't done a phoenix release, but this looks like what I had in mind. 
Thanks for fixing the missing headers while you're in there -- these should be 
caught by the apache:rat plugin; i'm not sure why they're not.

@JamesRTaylor and @chrajeshbabu have done phoenix releases recently. Does 
this look good by you guys?


> Build tgz only on release profile
> -
>
> Key: PHOENIX-2023
> URL: https://issues.apache.org/jira/browse/PHOENIX-2023
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Nick Dimiduk
>Assignee: Gabor Liptak
>  Labels: beginner
>
> We should follow [~enis]'s lead on HBASE-13816 and save everyone some time on 
> the build cycle by moving some (all?) of the assembly bits to a release 
> profile that's only invoked at RC time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2188) Overriding hbase.client.scanner.caching doesn't work

2015-08-19 Thread Samarth Jain (JIRA)
Samarth Jain created PHOENIX-2188:
-

 Summary: Overriding hbase.client.scanner.caching doesn't work
 Key: PHOENIX-2188
 URL: https://issues.apache.org/jira/browse/PHOENIX-2188
 Project: Phoenix
  Issue Type: Bug
Reporter: Samarth Jain


Below is the test I wrote which demonstrates that the phoenix's override of 
1000 for the scanner cache size is not being used:

{code} 
@Test
public void testScannerCacheSize() throws Exception {
Connection connection = 
DriverManager.getConnection("jdbc:phoenix:localhost:2181");
PhoenixConnection phxConn = connection.unwrap(PhoenixConnection.class);
// check config value in query services

System.out.println(PhoenixDriver.INSTANCE.getQueryServices().getProps().get(QueryServices.SCAN_CACHE_SIZE_ATTRIB));
   
Statement stmt = phxConn.createStatement();
PhoenixStatement phxStmt = stmt.unwrap(PhoenixStatement.class);
// double check the config size by looking at statement fetch size
System.out.println(phxStmt.getFetchSize());

}
{code} 

The offending code snippet is:
{code}
 QueryServices.withDefaults() {
Configuration config = 
HBaseFactoryProvider.getConfigurationFactory().getConfiguration();
QueryServicesOptions options = new QueryServicesOptions(config)
.setIfUnset(STATS_USE_CURRENT_TIME_ATTRIB, 
DEFAULT_STATS_USE_CURRENT_TIME)
..
.setIfUnset(SCAN_CACHE_SIZE_ATTRIB, DEFAULT_SCAN_CACHE_SIZE)
{code} 
The configuration returned by 
HBaseFactoryProvider.getConfigurationFactory().getConfiguration() has the 
hbase.client.scanner.caching set to 100. So the override doesn't take place 
because we are using setIfUnset.
 
Another override that I see that potentially won't work in future if HBase 
provides its own default is the RpcControllerFactory - 
hbase.rpc.controllerfactory.class because of
{code}
setIfUnset(RpcControllerFactory.CUSTOM_CONTROLLER_CONF_KEY, 
DEFAULT_CLIENT_RPC_CONTROLLER_FACTORY)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2189) Starting from HBase 1.x, phoenix shouldn't probably override the hbase.client.scanner.caching attribute

2015-08-19 Thread Samarth Jain (JIRA)
Samarth Jain created PHOENIX-2189:
-

 Summary: Starting from HBase 1.x, phoenix shouldn't probably 
override the hbase.client.scanner.caching attribute
 Key: PHOENIX-2189
 URL: https://issues.apache.org/jira/browse/PHOENIX-2189
 Project: Phoenix
  Issue Type: Bug
Reporter: Samarth Jain


After PHOENIX-2188 is fixed, we need to think about whether it makes sense to 
override the scanner cache size in Phoenix for branches HBase 1.x. For ex  - in 
HBase 1.1, the default value of hbase.client.scanner.caching is now 
Integer.MAX_VALUE.

{code:xml}

hbase.client.scanner.caching
2147483647
Number of rows that we try to fetch when calling next
on a scanner if it is not served from (local, client) memory. This 
configuration
works together with hbase.client.scanner.max.result.size to try and use the
network efficiently. The default value is Integer.MAX_VALUE by default so 
that
the network will fill the chunk size defined by 
hbase.client.scanner.max.result.size
rather than be limited by a particular number of rows since the size of 
rows varies
table to table. If you know ahead of time that you will not require more 
than a certain
number of rows from a scan, this configuration should be set to that row 
limit via
Scan#setCaching. Higher caching values will enable faster scanners but will 
eat up more
memory and some calls of next may take longer and longer times when the 
cache is empty.
Do not set this value such that the time between invocations is greater 
than the scanner
timeout; i.e. hbase.client.scanner.timeout.period
  
{code:xml}

>From the comments it sounds like, by default, HBase is going to provide an 
>upper bound on the scanner cache size in bytes and not number of records. 

If we end up overriding the hbase.client.scanner.caching to 1000, then 
potentially for narrower rows we will likely be fetching too few rows. For 
wider rows, likely the bytes limit will kick in to make sure we don't end up 
caching too much on the client.

Maybe we shouldn't be using the scanner caching override at all? Thoughts? 
[~jamestaylor], [~lhofhansl]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2023) Build tgz only on release profile

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704058#comment-14704058
 ] 

James Taylor commented on PHOENIX-2023:
---

I'd defer to what [~mujtabachohan] says.

> Build tgz only on release profile
> -
>
> Key: PHOENIX-2023
> URL: https://issues.apache.org/jira/browse/PHOENIX-2023
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Nick Dimiduk
>Assignee: Gabor Liptak
>  Labels: beginner
>
> We should follow [~enis]'s lead on HBASE-13816 and save everyone some time on 
> the build cycle by moving some (all?) of the assembly bits to a release 
> profile that's only invoked at RC time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1835) Adjust MetaDataEndPointImpl timestamps if table is transactional

2015-08-19 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704062#comment-14704062
 ] 

Thomas D'Silva commented on PHOENIX-1835:
-

I discussed this offline with @James Taylor and we decided this is not required.

> Adjust MetaDataEndPointImpl timestamps if table is transactional
> 
>
> Key: PHOENIX-1835
> URL: https://issues.apache.org/jira/browse/PHOENIX-1835
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>Assignee: Thomas D'Silva
>
> Phoenix correlates table metadata with the table data based on timestamp. 
> Since Tephra is adjusting timestamps for the data, we need to do the same for 
> the metadata operations (which aren't transactional through Tephra). Take a 
> look at MetaDataEndPointImpl and the MetaDataMutationResult where we return 
> the server timestamp (i.e. MetaDataMutationResult.getTable() for example). 
> This timestamp should be run through the TransactionUtil.translateTimestamp() 
> method).
> Add a point-in-time test with a table being altered, but your connection 
> being before that time (with CURRENT_SCN) as a test. We'll need to make sure 
> the Puts to the SYSTEM.CATALOG get timestamped correctly (but I think the 
> above will cause that).
> Also, my other hack in PostDDLCompiler, should not be necessary after this:
> {code}
> // FIXME: DDL operations aren't transactional, so 
> we're basing the timestamp on a server timestamp.
> // Not sure what the fix should be. We don't need 
> conflict detection nor filtering of invalid transactions
> // in this case, so maybe this is ok.
> if (tableRef.getTable().isTransactional()) {
> ts = TransactionUtil.translateMillis(ts);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-1835) Adjust MetaDataEndPointImpl timestamps if table is transactional

2015-08-19 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva resolved PHOENIX-1835.
-
Resolution: Invalid

> Adjust MetaDataEndPointImpl timestamps if table is transactional
> 
>
> Key: PHOENIX-1835
> URL: https://issues.apache.org/jira/browse/PHOENIX-1835
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>Assignee: Thomas D'Silva
>
> Phoenix correlates table metadata with the table data based on timestamp. 
> Since Tephra is adjusting timestamps for the data, we need to do the same for 
> the metadata operations (which aren't transactional through Tephra). Take a 
> look at MetaDataEndPointImpl and the MetaDataMutationResult where we return 
> the server timestamp (i.e. MetaDataMutationResult.getTable() for example). 
> This timestamp should be run through the TransactionUtil.translateTimestamp() 
> method).
> Add a point-in-time test with a table being altered, but your connection 
> being before that time (with CURRENT_SCN) as a test. We'll need to make sure 
> the Puts to the SYSTEM.CATALOG get timestamped correctly (but I think the 
> above will cause that).
> Also, my other hack in PostDDLCompiler, should not be necessary after this:
> {code}
> // FIXME: DDL operations aren't transactional, so 
> we're basing the timestamp on a server timestamp.
> // Not sure what the fix should be. We don't need 
> conflict detection nor filtering of invalid transactions
> // in this case, so maybe this is ok.
> if (tableRef.getTable().isTransactional()) {
> ts = TransactionUtil.translateMillis(ts);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2160) Projection of specific array index does not work

2015-08-19 Thread Dumindu Buddhika (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704068#comment-14704068
 ] 

Dumindu Buddhika commented on PHOENIX-2160:
---

When checking this out for a test case using debugger,
at this part,
{code}
final List indexKVs = Lists.newArrayList();
 // Create anon visitor to find reference to array in a generic 
way
 children.get(0).accept(new KeyValueExpressionVisitor() {
 @Override
 public Void visit(KeyValueColumnExpression expression) {
 if (expression.getDataType().isArrayType()) {
 indexKVs.add(expression);
 }
 return null;
 }
 });
{code}
it seems that, array expression comes in as a ProjectedColumnExpression, not a 
KeyValueColumnExpression. So indexKVs does not get updated. That is why the 
problem I think. 

> Projection of specific array index does not work
> 
>
> Key: PHOENIX-2160
> URL: https://issues.apache.org/jira/browse/PHOENIX-2160
> Project: Phoenix
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> PHOENIX-10 that allowed projection of specific array index does not work now. 
> Was looking into the code for some thing and found this issue. Let me know if 
> am missing something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2160) Projection of specific array index does not work

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704076#comment-14704076
 ] 

James Taylor commented on PHOENIX-2160:
---

Good catch, [~Dumindux]. Would you mind putting together a patch for this with 
a unit test that validates that the optimization is being used?

> Projection of specific array index does not work
> 
>
> Key: PHOENIX-2160
> URL: https://issues.apache.org/jira/browse/PHOENIX-2160
> Project: Phoenix
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> PHOENIX-10 that allowed projection of specific array index does not work now. 
> Was looking into the code for some thing and found this issue. Let me know if 
> am missing something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2160) Projection of specific array index does not work

2015-08-19 Thread Dumindu Buddhika (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704096#comment-14704096
 ] 

Dumindu Buddhika commented on PHOENIX-2160:
---

[~jamestaylor], I do not quiet understand, how should this be fixed?

> Projection of specific array index does not work
> 
>
> Key: PHOENIX-2160
> URL: https://issues.apache.org/jira/browse/PHOENIX-2160
> Project: Phoenix
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> PHOENIX-10 that allowed projection of specific array index does not work now. 
> Was looking into the code for some thing and found this issue. Let me know if 
> am missing something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2160) Projection of specific array index does not work

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704133#comment-14704133
 ] 

James Taylor commented on PHOENIX-2160:
---

You'd want to change it to find ProjectedColumnExpression instead, so something 
like this:
{code}
/**
 * 
 * Implementation of ExpressionVisitor where only KeyValueDataAccessor
 * is being visited
 *
 * 
 * @since 0.1
 */
public abstract class ProjectedColumnExpressionVisitor extends 
StatelessTraverseAllExpressionVisitor {
@Override
abstract public Void visit(ProjectedColumnExpression node);
}

final List indexKVs = Lists.newArrayList();
 // Create anon visitor to find reference to array in a generic 
way
 children.get(0).accept(new ProjectedColumnExpressionVisitor() {
 @Override
 public Void visit(ProjectedColumnExpression expression) {
 if (expression.getDataType().isArrayType()) {
 indexKVs.add(expression);
 }
 return null;
 }
 });
{code}

Make sure you have a unit test where the ARRAY is used in the primary key 
constraint, but I think the above will work.

> Projection of specific array index does not work
> 
>
> Key: PHOENIX-2160
> URL: https://issues.apache.org/jira/browse/PHOENIX-2160
> Project: Phoenix
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> PHOENIX-10 that allowed projection of specific array index does not work now. 
> Was looking into the code for some thing and found this issue. Let me know if 
> am missing something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2160) Projection of specific array index does not work

2015-08-19 Thread Dumindu Buddhika (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704143#comment-14704143
 ] 

Dumindu Buddhika commented on PHOENIX-2160:
---

Thanks [~jamestaylor]. I tried that. I was thinking, How this part should be 
changed at BaseScannerRegionObserver,

{code}
Cell rowKv = result.get(0);
for (KeyValueColumnExpression kvExp : arrayKVRefs) {
if (kvExp.evaluate(tuple, ptr)) {
for (int idx = tuple.size() - 1; idx >= 0; idx--) {
Cell kv = tuple.getValue(idx);
if (Bytes.equals(kvExp.getColumnFamily(), 0, 
kvExp.getColumnFamily().length,
kv.getFamilyArray(), kv.getFamilyOffset(), 
kv.getFamilyLength())
&& Bytes.equals(kvExp.getColumnName(), 0, 
kvExp.getColumnName().length,
kv.getQualifierArray(), 
kv.getQualifierOffset(), kv.getQualifierLength())) {
// remove the kv that has the full array values.
{code}

> Projection of specific array index does not work
> 
>
> Key: PHOENIX-2160
> URL: https://issues.apache.org/jira/browse/PHOENIX-2160
> Project: Phoenix
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> PHOENIX-10 that allowed projection of specific array index does not work now. 
> Was looking into the code for some thing and found this issue. Let me know if 
> am missing something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2160) Projection of specific array index does not work

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704155#comment-14704155
 ] 

James Taylor commented on PHOENIX-2160:
---

In your visitor, instead of adding the ProjectedColumnExpression, instantiate a 
new KeyValueColumnExpression. Since you're on the client-side, you should be 
able to get the column from ProjectedColumnExpression (add a getColumn accessor 
using the position to index into the Collection - try changing that to 
a List instead too so you can index into it). Given a PColumn, you can 
instantiate a KeyValueColumnExpression.

> Projection of specific array index does not work
> 
>
> Key: PHOENIX-2160
> URL: https://issues.apache.org/jira/browse/PHOENIX-2160
> Project: Phoenix
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> PHOENIX-10 that allowed projection of specific array index does not work now. 
> Was looking into the code for some thing and found this issue. Let me know if 
> am missing something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


PhoenixHbaseStorage on secure cluster

2015-08-19 Thread Siddhi Mehta
Hey Guys


I am trying to make use of the PhoenixHbaseStorage to write to Hbase Table.


The way we start this pig job is from within a map task(Similar to oozie)


I run TableMapReduceUtil.initCredentials(job) on the client to get the
correct AuthTokens for my map task


I have ensured that hbase-site.xml is on the classpath for the pigjob and
also hbase-client and hbase-server jars.


Any ideas on what could I be missing?


I am using Phoenix4.5 version and hbase 0.98.13


I see the following exception in the the logs of the pig job that tries
writing to hbase



Aug 20, 2015 12:04:31 AM
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper 
INFO: Process identifier=hconnection-0x3c1e23ff connecting to ZooKeeper
ensemble=hmaster1:2181,hmaster2:2181,hmaster3:2181
Aug 20, 2015 12:04:31 AM
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
makeStub
INFO: getMaster attempt 1 of 35 failed; retrying after sleep of 100,
exception=com.google.protobuf.ServiceException:
java.lang.NullPointerException
Aug 20, 2015 12:04:31 AM
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
makeStub
INFO: getMaster attempt 2 of 35 failed; retrying after sleep of 200,
exception=com.google.protobuf.ServiceException: java.io.IOException: Call
to blitz2-mnds1-3-sfm.ops.sfdc.net/{IPAddress}:6 failed on local
exception: java.io.EOFException
Aug 20, 2015 12:04:31 AM
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
makeStub
INFO: getMaster attempt 3 of 35 failed; retrying after sleep of 300,
exception=com.google.protobuf.ServiceException: java.io.IOException: Call
to blitz2-mnds1-3-sfm.ops.sfdc.net/{IPAddress}:6 failed on local
exception: java.io.EOFException
Aug 20, 2015 12:04:31 AM
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
makeStub
INFO: getMaster attempt 4 of 35 failed; retrying after sleep of 500,
exception=com.google.protobuf.ServiceException: java.io.IOException: Call
to blitz2-mnds1-3-sfm.ops.sfdc.net/{IPAddress}:6 failed on local
exception: java.io.EOFException
Aug 20, 2015 12:04:32 AM:


[jira] [Created] (PHOENIX-2190) Updating the Documentaation

2015-08-19 Thread Nishani (JIRA)
Nishani  created PHOENIX-2190:
-

 Summary: Updating the Documentaation
 Key: PHOENIX-2190
 URL: https://issues.apache.org/jira/browse/PHOENIX-2190
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Nishani 
Assignee: Nishani 


The Documentation would contain the following.
How to start the Tracing Web Application
How to change the port number on which it runs
Five visualization features in the Tracing Web Application



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2191) Union Support in Phoenix/Calcite Integration

2015-08-19 Thread Maryann Xue (JIRA)
Maryann Xue created PHOENIX-2191:


 Summary: Union Support in Phoenix/Calcite Integration
 Key: PHOENIX-2191
 URL: https://issues.apache.org/jira/browse/PHOENIX-2191
 Project: Phoenix
  Issue Type: Task
Reporter: Maryann Xue
Assignee: Maryann Xue






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2192) Implement PhoenixUnion

2015-08-19 Thread Maryann Xue (JIRA)
Maryann Xue created PHOENIX-2192:


 Summary: Implement PhoenixUnion
 Key: PHOENIX-2192
 URL: https://issues.apache.org/jira/browse/PHOENIX-2192
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Maryann Xue






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2193) Add rules to push down Sort through Union

2015-08-19 Thread Maryann Xue (JIRA)
Maryann Xue created PHOENIX-2193:


 Summary: Add rules to push down Sort through Union
 Key: PHOENIX-2193
 URL: https://issues.apache.org/jira/browse/PHOENIX-2193
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Maryann Xue
Assignee: Maryann Xue


Sort will be pushed through Union to its inputs, and meanwhile the original 
sort will become a merge-sort attribute in the PhoenixUnion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-2192) Implement PhoenixUnion

2015-08-19 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue reassigned PHOENIX-2192:


Assignee: Maryann Xue

> Implement PhoenixUnion
> --
>
> Key: PHOENIX-2192
> URL: https://issues.apache.org/jira/browse/PHOENIX-2192
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2192) Implement PhoenixUnion

2015-08-19 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue resolved PHOENIX-2192.
--
Resolution: Fixed

> Implement PhoenixUnion
> --
>
> Key: PHOENIX-2192
> URL: https://issues.apache.org/jira/browse/PHOENIX-2192
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

2015-08-19 Thread maghamravikiran (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704326#comment-14704326
 ] 

maghamravikiran commented on PHOENIX-2154:
--

True [~giacomotaylor] Lars point is valid but doesn't fit our use case as we 
need to have a reducer to update the state of the index table once the mapper 
output is committed to HBase .  That's the reason , I am forcing an autoCommit 
in the mapper so that by the time the reducer begins execution, we know the 
data is in the index table and we are left with just updating the state. 

[~lhofhansl] please correct me if I am wrong. 
 


> Failure of one mapper should not affect other mappers in MR index build
> ---
>
> Key: PHOENIX-2154
> URL: https://issues.apache.org/jira/browse/PHOENIX-2154
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: IndexTool.java, PHOENIX-2154-WIP.patch
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704332#comment-14704332
 ] 

James Taylor commented on PHOENIX-2154:
---

Did you try it, Ravi? In theory, the mapper would call the HBase APIs, so when 
the reducer runs, all the mapper tasks have been completed.

> Failure of one mapper should not affect other mappers in MR index build
> ---
>
> Key: PHOENIX-2154
> URL: https://issues.apache.org/jira/browse/PHOENIX-2154
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: IndexTool.java, PHOENIX-2154-WIP.patch
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

2015-08-19 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704332#comment-14704332
 ] 

James Taylor edited comment on PHOENIX-2154 at 8/20/15 5:50 AM:


Did you try it, Ravi? In theory, the mapper would call the HBase APIs, so when 
the reducer runs, all the mapper tasks have been completed. We don't want 
Phoenix to send the work to HBase and we definitely don't want auto commit on, 
as that would cause an RPC for every row.


was (Author: jamestaylor):
Did you try it, Ravi? In theory, the mapper would call the HBase APIs, so when 
the reducer runs, all the mapper tasks have been completed.

> Failure of one mapper should not affect other mappers in MR index build
> ---
>
> Key: PHOENIX-2154
> URL: https://issues.apache.org/jira/browse/PHOENIX-2154
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: IndexTool.java, PHOENIX-2154-WIP.patch
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

2015-08-19 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704376#comment-14704376
 ] 

Samarth Jain commented on PHOENIX-2154:
---

Reducers could start running before all the mappers are complete. However, the 
reduce step in a reducer is not executed till all the mappers are done. Ravi, 
it looks like you are updating the index state in the setUp method. Do you know 
when is setUp executed? Is it executed before the shuffle phase of a reducer? 
If yes, it probably makes sense to move the code you have in setUp to the 
reduce method instead. 

> Failure of one mapper should not affect other mappers in MR index build
> ---
>
> Key: PHOENIX-2154
> URL: https://issues.apache.org/jira/browse/PHOENIX-2154
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: maghamravikiran
> Attachments: IndexTool.java, PHOENIX-2154-WIP.patch
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)