[jira] [Commented] (PHOENIX-2031) Unable to process timestamp/Date data loaded via Phoenix org.apache.phoenix.pig.PhoenixHBaseLoader
[ https://issues.apache.org/jira/browse/PHOENIX-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585487#comment-14585487 ] Prashant Kommireddi commented on PHOENIX-2031: -- LGTM. [~aliciashu] it would be nice to have one of the test cases check for the actual Date value in addition to the nonnull check you are making. Changes are good otherwise, thanks for the contribution Alicia! > Unable to process timestamp/Date data loaded via Phoenix > org.apache.phoenix.pig.PhoenixHBaseLoader > -- > > Key: PHOENIX-2031 > URL: https://issues.apache.org/jira/browse/PHOENIX-2031 > Project: Phoenix > Issue Type: Bug >Reporter: Alicia Ying Shu >Assignee: Alicia Ying Shu > Attachments: PHOENIX-2031.patch > > > 2015-05-11 15:41:44,419 WARN main org.apache.hadoop.mapred.YarnChild: > Exception running child : org.apache.pig.PigException: ERROR 0: Error > transforming PhoenixRecord to Tuple Cannot convert a Unknown to a > java.sql.Timestamp at > org.apache.phoenix.pig.util.TypeUtil.transformToTuple(TypeUtil.java:293) > at > org.apache.phoenix.pig.PhoenixHBaseLoader.getNext(PhoenixHBaseLoader.java:197) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) > at > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2031) Unable to process timestamp/Date data loaded via Phoenix org.apache.phoenix.pig.PhoenixHBaseLoader
[ https://issues.apache.org/jira/browse/PHOENIX-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585477#comment-14585477 ] maghamravikiran commented on PHOENIX-2031: -- The patch looks good [~giacomotaylor] [~ayingshu] One minor request. Can we validate if the passed in DATE value matches the value returned from the tuple along with the current check you are doing to validate for non null. This will ensure we don't see any issues arising due to TimeZones. > Unable to process timestamp/Date data loaded via Phoenix > org.apache.phoenix.pig.PhoenixHBaseLoader > -- > > Key: PHOENIX-2031 > URL: https://issues.apache.org/jira/browse/PHOENIX-2031 > Project: Phoenix > Issue Type: Bug >Reporter: Alicia Ying Shu >Assignee: Alicia Ying Shu > Attachments: PHOENIX-2031.patch > > > 2015-05-11 15:41:44,419 WARN main org.apache.hadoop.mapred.YarnChild: > Exception running child : org.apache.pig.PigException: ERROR 0: Error > transforming PhoenixRecord to Tuple Cannot convert a Unknown to a > java.sql.Timestamp at > org.apache.phoenix.pig.util.TypeUtil.transformToTuple(TypeUtil.java:293) > at > org.apache.phoenix.pig.PhoenixHBaseLoader.getNext(PhoenixHBaseLoader.java:197) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) > at > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1981) PhoenixHBase Load and Store Funcs should handle all Pig data types
[ https://issues.apache.org/jira/browse/PHOENIX-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585464#comment-14585464 ] ASF GitHub Bot commented on PHOENIX-1981: - Github user elilevine commented on a diff in the pull request: https://github.com/apache/phoenix/pull/85#discussion_r32393260 --- Diff: phoenix-pig/src/test/java/org/apache/phoenix/pig/util/TypeUtilTest.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.phoenix.pig.util; + +import static org.junit.Assert.assertEquals; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; + +import org.apache.phoenix.pig.writable.PhoenixPigDBWritable; +import org.apache.pig.ResourceSchema.ResourceFieldSchema; +import org.apache.pig.data.DataType; +import org.apache.pig.data.Tuple; +import org.junit.Test; + +import com.google.common.collect.Lists; + +public class TypeUtilTest { + +@Test +public void testTransformToTuple() throws Exception { +PhoenixPigDBWritable record = mock(PhoenixPigDBWritable.class); +List values = Lists.newArrayList(); +values.add("213123"); +values.add(1231123); +values.add(31231231232131L); +values.add("bytearray".getBytes()); +when(record.getValues()).thenReturn(values); + +ResourceFieldSchema field = new ResourceFieldSchema().setType(DataType.CHARARRAY); --- End diff -- @prashantkommireddi, mind making the few small changes Jesse asked for? Would be nice to get this in on Monday. Thanks! > PhoenixHBase Load and Store Funcs should handle all Pig data types > -- > > Key: PHOENIX-1981 > URL: https://issues.apache.org/jira/browse/PHOENIX-1981 > Project: Phoenix > Issue Type: Improvement >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > > The load and store func (Pig integration) currently do not handle all Pig > types. Here is a complete list > http://pig.apache.org/docs/r0.13.0/basic.html#data-types > In addition to handling all simple types (BigInteger and BigDecimal are > missing in the LoadFunc currently), we should also look into handling complex > Pig types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] phoenix pull request: PHOENIX-1981 : PhoenixHBase Load and Store F...
Github user elilevine commented on a diff in the pull request: https://github.com/apache/phoenix/pull/85#discussion_r32393260 --- Diff: phoenix-pig/src/test/java/org/apache/phoenix/pig/util/TypeUtilTest.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.phoenix.pig.util; + +import static org.junit.Assert.assertEquals; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; + +import org.apache.phoenix.pig.writable.PhoenixPigDBWritable; +import org.apache.pig.ResourceSchema.ResourceFieldSchema; +import org.apache.pig.data.DataType; +import org.apache.pig.data.Tuple; +import org.junit.Test; + +import com.google.common.collect.Lists; + +public class TypeUtilTest { + +@Test +public void testTransformToTuple() throws Exception { +PhoenixPigDBWritable record = mock(PhoenixPigDBWritable.class); +List values = Lists.newArrayList(); +values.add("213123"); +values.add(1231123); +values.add(31231231232131L); +values.add("bytearray".getBytes()); +when(record.getValues()).thenReturn(values); + +ResourceFieldSchema field = new ResourceFieldSchema().setType(DataType.CHARARRAY); --- End diff -- @prashantkommireddi, mind making the few small changes Jesse asked for? Would be nice to get this in on Monday. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PHOENIX-2021) Implement ARRAY_CAT built in function
[ https://issues.apache.org/jira/browse/PHOENIX-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585434#comment-14585434 ] ramkrishna.s.vasudevan commented on PHOENIX-2021: - Thanks for the updated patch [~Dumindux] {code} int nullsInMiddleAfterConcat = nullsAtTheEndOfArray1 + nullsAtTheBeginningOfArray2; int bytesForNullsBefore = nullsAtTheBeginningOfArray2 / 255 + nullsAtTheBeginningOfArray2 % 255 == 0 ? 0 : 1; int bytesForNullsAfter = nullsInMiddleAfterConcat / 255 + nullsInMiddleAfterConcat % 255 == 0 ? 0 : 1; //Increase of length required to store nulls int lengthIncreaseForNulls = bytesForNullsAfter - bytesForNullsBefore; //Length increase incremented by one when there were no nulls at the beginning of array and when there are //nulls at the end of array 1 as we need to allocate a byte for separator byte in this case. lengthIncreaseForNulls += nullsAtTheBeginningOfArray2 == 0 && nullsAtTheEndOfArray1 != 0 ? Bytes.SIZEOF_BYTE : 0; int newOffsetArrayPosition = offsetArrayPositionArray1 + offsetArrayPositionArray2 + lengthIncreaseForNulls - 2 * Bytes.SIZEOF_BYTE; {code} Take the case where array 1 had 10 nulls and array 2 had 246 nulls. So total number of arrays after concat is 256 (in the middle). In both the cases you are only seeing the bytes needed to write the number of nulls leaving out the SEPERATOR_BYTE. (no problem in that). So bytesForNullbefore and bytesForNullAfter is going to be 1. Now after concatenation since it is going to have 256 nulls it is going to have an increase in the nulls serialization. But will the above logic have that? You can add test cases with different nulls scenarios and naming the tests with what it is doing (though names are bigger may help to identify all the cases). Regarding using ArrayModifier - I would say we should change the name of the ArrayModifierFucntion APIs name because at the end of the day both are going to work with Expresssion - in the Prepend/Append cases they are single elements where as here it is two Arrays. So if we can says LHSExprssion and RHSExpression and make our checks and conditions similar then we should be good with it. What I mean is getArrayExpr and getElementExpr can be made as getLHSExpr and getRHSExpr. The same would apply for the ARrayConcat also. Similarly for the getDataType() can be written as getLHSDataType and getRHSDataType()? Ya, seeing the ArrayConcat alone we may not directly infer what is RHS and LHS but the name of the function and the javadoc should help to understand that. The main aim is code reusability. Just one base class should serve all this purpose. > Implement ARRAY_CAT built in function > - > > Key: PHOENIX-2021 > URL: https://issues.apache.org/jira/browse/PHOENIX-2021 > Project: Phoenix > Issue Type: Sub-task >Reporter: Dumindu Buddhika >Assignee: Dumindu Buddhika > Attachments: PHOENIX-2021-v3.patch, PHOENIX-2021.patch > > > Ex: > ARRAY_CAT(ARRAY[2, 3, 4], ARRAY[4, 5, 6]) = ARRAY[2,3,4,4,5,6] > ARRAY_CAT(ARRAY["a", "b"], ARRAY["c", "d"]) = ARRAY["a", "b", "c", "d"] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: YCSB binding for Phoenix
honestly, most of the places I see YCSB put to good use is in stress testing deployments of some particular back-end. Comparing back-end systems still happens, but it's less common since you have to have more knowledge of each back-end system than most beginners have. It's one of the areas I'm hoping to improve, but that's a horse of a different color. Also, we'd love to get more realistic workloads put together in the YCSB project; hopefully y'all don't mind us potentially stealing from phref. ;) On Fri, Jun 12, 2015 at 7:13 PM, James Taylor wrote: > As far as configuration, it depends on the cluster size and data > sizes. It'd be nice to have the configuration defaults adapt based on > the cluster size, but they don't today. I'm not too familiar with the > YCSB tests. > > FWIW, we initially planned to try to use YCSB internally for perf > testing, but we ended up going a different route, creating our Pherf > tool (http://phoenix.apache.org/pherf.html). The design goals were > really different: Pherf is for baselining perf at scale over realistic > workloads to get an idea of what perf looks like from release to > release (it'll also do functional testing at scale). YCSB seems more > targeted at comparing different back-end systems against each other. > > Thanks, > James > > On Fri, Jun 12, 2015 at 3:24 PM, Andrew Purtell > wrote: > > Thanks. > > > > That was more of a question for the Phoenix folks around here, if there > are > > any configuration options or particular JDBC idioms that might produce > > better results than what the generic driver is doing > > > > On Fri, Jun 12, 2015 at 12:26 PM, Sean Busbey > wrote: > > > >> Yes, for now. I know that there are some problems using the JDBC driver > >> with Oracle (see #128). > >> > >> I could put a note in our test plan to check Phoenix with the JDBC > driver. > >> It'd be even better if one of y'all could sign up to do the testing. > We'll > >> have a plan together monday. > >> > >> On Thu, Jun 11, 2015 at 12:10 AM, Andrew Purtell > >> wrote: > >> > >> > Pull #178 is only the existing JDBC driver with additional > dependencies > >> in > >> > the POM, support files, and a different connect string, right? > >> > > >> > What more/different might it be than the existing JDBC driver? I'm > >> thinking > >> > of anything other than vanilla JDBC that Pherf might do. > >> > > >> > > >> > On Tue, Jun 9, 2015 at 7:45 PM, Sean Busbey > wrote: > >> > > >> > > Hi folks! > >> > > > >> > > The YCSB community is reawakening, and we're prepping to start > regular > >> > > releases again. The first of these is hitting feature freeze on > Monday > >> > June > >> > > 15th. > >> > > > >> > > There's an old PR for adding Phoenix support[1], but it has > >> > unsurprisingly > >> > > gone stale during YCSB's long hiatus. I'd like to get Phoenix > support > >> in > >> > > sooner rather than later. We're aiming at monthly releases, so this > >> > coming > >> > > Monday isn't a hard deadline. > >> > > > >> > > Any chance someone from the Phoenix community could take a look? > >> > > > >> > > [1]: https://github.com/brianfrankcooper/YCSB/pull/178 > >> > > > >> > > -- > >> > > Sean > >> > > > >> > > >> > > >> > > >> > -- > >> > Best regards, > >> > > >> >- Andy > >> > > >> > Problems worthy of attack prove their worth by hitting back. - Piet > Hein > >> > (via Tom White) > >> > > >> > >> > >> > >> -- > >> Sean > >> > > > > > > > > -- > > Best regards, > > > >- Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > -- Sean
Re: [jira] [Updated] (PHOENIX-1118) Provide a tool for visualizing Phoenix tracing information
Hi All, On the explain plan to show which part of the code is run where a graph is shown[1]. Default chart will be a Pie chart and I'm planing to use few more chat types so user can pick his choice. If any node responding slowly. Phoenix database administrator can exam the node and examin what are queries run on a particular time. I have run few examples on secondary indexes[4] and I got sample data and it can be used for the milestone1(end of this week). It is shown with timesliding capabilities. Trace segments are shown in a timeline.[2] Does filters mean 'where' like logic statements? The database admin can track the duration for a particular trace from timeline visualization so he can use the filters effectively (best order of the filters) in a query to get a quick respond. I tried the join query and it didn't give any results or corresponding traces. This is the reference I followed [3]. Is there any more steps to follow? To visualize the tracing details I looked through few charting libraries and I will give the comparison details over them. Please feel free to give the feedback on the mock uis. Thanks. [1] https://issues.apache.org/jira/secure/attachment/12739498/m1-mockUI-tracedistribution.png [2] https://issues.apache.org/jira/secure/attachment/12739499/m1-mockUI-tracetimeline.png [3] https://phoenix.apache.org/joins.html [4] http://ayolajayamaha.blogspot.com/2015/06/tracing-data-secondary-indixes.html On Thu, Jun 11, 2015 at 11:39 AM, Ayola Jayamaha wrote: > Yes. It was a bit confusing :-). But it was useful to get a good idea on > the use cases. > Thanks. > > On Wed, Jun 10, 2015 at 11:57 PM, James Taylor > wrote: > >> Excellent, Nishani (and you forgot to say "rambling" :-), but I'm glad >> it helped). >> >> On Wed, Jun 10, 2015 at 11:16 AM, Ayola Jayamaha >> wrote: >> > Hi James, >> > >> > Thanks a lot for the lengthy and descriptive reply. I am currently >> looking >> > through UI components and charting libraries that can be used for the >> UI. I >> > refered [1] with regard to your explaination and came up with some mock >> ups >> > which I will share soon. >> > >> > Thanks, >> > Nishani >> > >> > [1] https://phoenix.apache.org/language/#index_hint >> > [2] >> > >> https://phoenix.apache.org/faq.html#How_do_I_create_Secondary_Index_on_a_table >> > >> > On Tue, Jun 9, 2015 at 11:39 PM, James Taylor >> > wrote: >> > >> >> Hi Nishani, >> >> I'd recommend focusing on higher level use cases. From the user's >> >> point of view, they're executing a query and for some reason it's >> >> slower than they expect. How do they figure out why? >> >> >> >> They might first do an EXPLAIN on their query to see how Phoenix is >> >> executing it. Which parts are run where? Are secondary indexes being >> >> used as expected? Are filters being pushed down as expected? A better >> >> way to visualize the explain plan might be a good thing for you to >> >> start with. >> >> >> >> Second, assuming the explain plan looks good, they'll want to turn on >> >> tracing so that they can get runtime information on which parts of >> >> their query are taking the longest. >> >> >> >> Maybe more than one Phoenix table is involved - how will you display >> >> the tracing information across multiple tables for a query that does a >> >> join? Maybe you can punt on this first pass, and focus on single table >> >> queries. A related use case would be a DML statement that's executed >> >> and taking longer than expected. Let's say that the table being >> >> updated has one or more secondary indexes that are also updating the >> >> index tables. Seeing the entire picture of both the table writes plus >> >> the index writes on the same graph would be great. >> >> >> >> For the single-table query user case, what does the distribution of >> >> time look like across all the region servers participating in the >> >> query? Maybe some kind of graph that shows quickly if one region >> >> server is taking much more time than the others. Perhaps that's an >> >> indication that the table statistics need to be re-run, as there may >> >> be skew that's developed such that one of the threads is handling more >> >> data than it should. Or perhaps there's an issue with that particular >> >> region server. Was there something else going on at the same time on >> >> that region server, like a background compaction/split process? If >> >> that information is available in the trace table (not sure), it would >> >> be very cool to be able to superimpose that on top of the query trace >> >> graph. >> >> >> >> Another test might be to run a query over a different table and see if >> >> the same region server shows up again as being slow. So superimposing >> >> the query trace graphs of multiple queries might give the user some >> >> insight. >> >> >> >> IMHO, this is the kind of angle you should come at this from. >> >> >> >> Thanks, >> >> James >> >> >> >> On Mon, Jun 8, 2015 at 4:12 AM, Ayola Jayamaha > > >> >> wrote: >> >> > Hi All, >>
[jira] [Updated] (PHOENIX-1118) Provide a tool for visualizing Phoenix tracing information
[ https://issues.apache.org/jira/browse/PHOENIX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishani updated PHOENIX-1118: -- Attachment: m1-mockUI-tracetimeline.png m1-mockUI-tracedistribution.png Mock UI for milestone1 1. Understanding the tracing distribution over the nodes 2. Visualizing tracing details over a time period > Provide a tool for visualizing Phoenix tracing information > -- > > Key: PHOENIX-1118 > URL: https://issues.apache.org/jira/browse/PHOENIX-1118 > Project: Phoenix > Issue Type: Sub-task >Reporter: James Taylor >Assignee: Nishani > Labels: Java, SQL, Visualization, gsoc2015, mentor > Attachments: MockUp1-TimeSlider.png, MockUp2-AdvanceSearch.png, > MockUp3-PatternDetector.png, MockUp4-FlameGraph.png, Screenshot of dependency > tree.png, m1-mockUI-tracedistribution.png, m1-mockUI-tracetimeline.png, > screenshot of tracing web app.png > > > Currently there's no means of visualizing the trace information provided by > Phoenix. We should provide some simple charting over our metrics tables. Take > a look at the following JIRA for sample queries: > https://issues.apache.org/jira/browse/PHOENIX-1115?focusedCommentId=14323151&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14323151 -- This message was sent by Atlassian JIRA (v6.3.4#6332)