Re: Spark 2.1.1 (Scala 2.11.8) write to Phoenix 4.7 (HBase 1.1.2)

2018-01-31 Thread Josh Mahonin
Hi, As per https://phoenix.apache.org/phoenix_spark.html, Apache Phoenix is compiled against Spark2 only in versions 4.10 and above. If you must use Phoenix 4.7 against Spark 2.x, you may need to apply PHOENIX- yourself:

Re: [ANNOUNCE] New PMC Member: Sergey Soldatov

2017-09-25 Thread Josh Mahonin
Congratulations Sergey! On Sun, Sep 24, 2017 at 4:05 PM, Ted Yu wrote: > Congratulations, Sergey ! > > On Sun, Sep 24, 2017 at 1:00 PM, Josh Elser wrote: > >> All, >> >> The Apache Phoenix PMC has recently voted to extend an invitation to >> Sergey to

Re: Use Phoenix hints with Spark Integration [main use case: block cache disable]

2017-08-31 Thread Josh Mahonin
Hi Roberto, At present, I don't believe there's any way to pass a query hint explicitly, as the SELECT statement is built based on the table name and columns, down in this method:

Re: phoenix spark options not supporint query in dbtable

2017-08-17 Thread Josh Mahonin
for the inputs. > > > Kanagha > > On Thu, Aug 17, 2017 at 7:17 AM, Josh Mahonin <jmaho...@gmail.com> wrote: > >> Hi, >> >> Phoenix is able to parallelize queries based on the underlying HBase >> region splits, as well as its own internal guideposts based o

Re: Custom Connector for Prestodb

2017-08-17 Thread Josh Mahonin
Hi Luqman, I just responded to another query on the list about phoenix-spark that may help shed some light. In addition, the preferred locations the phoenix-spark connector exposes are determined in the general PhoenixInputFormat MapReduce code [1] I'm not very familiar with PrestoDB, but if

Re: phoenix spark options not supporint query in dbtable

2017-08-17 Thread Josh Mahonin
Hi, Phoenix is able to parallelize queries based on the underlying HBase region splits, as well as its own internal guideposts based on statistics collection [1] The phoenix-spark connector exposes those splits to Spark for the RDD / DataFrame parallelism. In order to test this out, you can try

Re: Apache Spark Integration

2017-07-19 Thread Josh Mahonin
Hi Luqman, At present, the phoenix-spark integration relies on the schema having been already created. There has been some discussion of augmenting the supported Spark 'SaveMode's to include 'CREATE IF NOT EXISTS' logic. https://issues.apache.org/jira/browse/PHOENIX-2745

Re: Exception ERROR 201 (22000): Illegal data. Expected length of at least 8 bytes, but had 4

2017-07-06 Thread Josh Mahonin
and provide it if possible. > > By the way, changing all local indexes to all global indexes no longer > causes the exception. > > It seems there is a problem with the local index. > > > Thanks, > > Takashi > > > 2017-07-05 22:35 GMT+09:00 Josh Mahonin <jmaho.

Re: Exception ERROR 201 (22000): Illegal data. Expected length of at least 8 bytes, but had 4

2017-07-05 Thread Josh Mahonin
Hi, >From the logs you attached, it appears that you're getting the exception on the following query: SELECT trid, tid, frtp, frno, gzid, ontm, onty, onlt, onln, oftm, ofty, oflt, ofln, onwk, onhr, wday, dist, drtn, delf, sntf, cdtm, udtm FROM trp WHERE tid = ? AND delf = FALSE ORDER BY oftm

Re: Getting Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.phoenix.spark. Please find packages at http://spark-packages.org Exception

2017-03-16 Thread Josh Mahonin
Hi Sateesh, It seems you are missing the import which gives Spark visibility into the "org.apache.phoenix.spark". From the documentation page: *import org.apache.phoenix.spark._* I'm not entirely sure how this works in Java, however. You might have some luck with: *import static

Re: PySpark and Phoenix Dynamic Columns

2017-02-24 Thread Josh Mahonin
Hi Craig, I think this is an open issue in PHOENIX-2648 ( https://issues.apache.org/jira/browse/PHOENIX-2648) There seems to be a workaround by using a 'VIEW' instead, as mentioned in that ticket. Good luck, Josh On Thu, Feb 23, 2017 at 11:56 PM, Craig Roberts

Re: Still having issues

2017-02-16 Thread Josh Mahonin
It still seems that Spark is unable to find all of the Phoenix/HBase classes that are necessary. As a reference, I've got a Docker image that might help: https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark The versions of Phoenix and Spark it uses are a bit out of date, but it shows

Re: FW: Failing on writing Dataframe to Phoenix

2017-02-15 Thread Josh Mahonin
Hi, Spark is unable to load the Phoenix classes it needs. If you're using a recent version of Phoenix, please ensure the "fat" *client* JAR (or for older versions of Phoenix, the Phoenix *client*-spark JAR) is on your Spark driver and executor classpath [1]. The 'phoenix-spark' JAR is

Re: Phoenix-Spark Integration Java Code Throwing org.apache.hadoop.mapred.InvalidJobConfException Exception

2017-02-01 Thread Josh Mahonin
Hi Ravi, It looks like you're invoking the PhoenixInputFormat class directly from Spark, which actually bypasses the phoenix-spark integration completely. Others on the list might be more helpful with regards to Java implementation, but I suspect if you start with using the DataFrame API,

Re: Moving column family into new table

2017-01-19 Thread Josh Mahonin
ll. Do you have any other architecture recommendations for our use case? > Would storing the images directly in HBase be any better? > > On Thu, Jan 19, 2017 at 12:02 PM, Josh Mahonin <jmaho...@gmail.com> wrote: > >> Hi Mark, >> >> At present, the Spark partitions are

Re: Moving column family into new table

2017-01-19 Thread Josh Mahonin
Hi Mark, At present, the Spark partitions are basically equivalent to the number of regions in the underlying HBase table. This is typically something you can control yourself, either using pre-splitting or salting ( https://phoenix.apache.org/faq.html#Are_there_any_tips_for_optimizing_Phoenix).

Re: how to write spark2 dataframe to phoenix?

2017-01-11 Thread Josh Mahonin
Hi, Spark 2.x isn't currently supported in a released Phoenix version, but is slated for the upcoming 4.10.0 release. If you'd like to compile your own version in the meantime, you can find the ticket/patch here: https://issues.apache.org/jira/browse/PHOENIX- or:

Re: Spark hang on load phoenix table

2016-11-16 Thread Josh Mahonin
Hi, Are there any logs in the Spark driver and executors which would help provide some context? In diagnosing, increasing the log level to DEBUG might be useful as well. Also, the snippet you posted is a 'lazy' operation. In theory it should return quickly, and only evaluate when some sort of

Re: Inserting into Temporary Phoenix table using Spark plugin

2016-11-16 Thread Josh Mahonin
Hi Hussain, I'm not familiar with the Spark temporary table syntax. Perhaps you can work around it by using other options, such as the DataFrame.save() functionality which is documented [1] and unit tested [2]. I suspect what you're encountering is a valid use case. If you could also file a JIRA

Re: Accessing phoenix tables in Spark 2

2016-10-07 Thread Josh Mahonin
pg >> r.getString(3), // HID >> >> ) >> } >> >> val data = new JdbcRDD(sc, createConnection, >> "SELECT DATUM, PG, HID, ..... WHERE DATUM >= ? * 1000 AND DATUM <= ? * >> 1000 and PG = ", >> lowerBound = 1364774400, upp

Re: Accessing phoenix tables in Spark 2

2016-10-07 Thread Josh Mahonin
Hi Mich, There's an open ticket about this issue here: https://issues.apache.org/jira/browse/PHOENIX- Long story short, Spark changed their API (again), breaking the existing integration. I'm not sure the level of effort to get it working with Spark 2.0, but based on examples from other

Re: bulk-delete spark phoenix

2016-09-28 Thread Josh Mahonin
Hi Fabio, You could probably just execute a regular DELETE query from a JDBC call, which is generally safe to do either from the Spark driver or within an executor. As long as auto-commit is enabled, it's an entirely server side operation: https://phoenix.apache.org/language/#delete Josh On

Re: bulk-upsert spark phoenix

2016-09-28 Thread Josh Mahonin
Spark, what the CSV loader does, I'll > surely write to the mailing list, or open a Jira, or maybe even open a PR, > right? > > Thank you again > > #A.M. > > On 09/28/2016 05:10 PM, Josh Mahonin wrote: > > Hi Antonio, > > You're correct, the phoenix-spark output uses

Re: bulk-upsert spark phoenix

2016-09-28 Thread Josh Mahonin
Hi Antonio, You're correct, the phoenix-spark output uses the Phoenix Hadoop OutputFormat under the hood, which effectively does a parallel, batch JDBC upsert. It should scale depending on the number of Spark executors, RDD/DataFrame parallelism, and number of HBase RegionServers, though

Re: phenix spark Plugin not working for spark 2.0

2016-09-26 Thread Josh Mahonin
Hi Dalin, It looks like Spark may have gone and broken their API again for Spark 2.0. Could you file a JIRA ticket please? Thanks, Josh On Mon, Sep 26, 2016 at 1:17 PM, dalin.qin wrote: > Hi I'm trying some test with spark 2.0 together with phoenix 4.8 . My > enviroment

Re: How to manually generate a salted row key?

2016-09-13 Thread Josh Mahonin
Hi Marica, Are you able to successfully write your rowkey without salting? If not, it could be that your 'generateRowKey' function is the culprit. FWIW, we have some code that does something similar, though we use 'getSaltedKey': // If salting, we need to prepend an empty byte to 'rowKey', then

Re: When would/should I use spark with phoenix?

2016-09-13 Thread Josh Mahonin
> > Dalin > > > On Mon, Sep 12, 2016 at 5:36 PM, Josh Mahonin <jmaho...@gmail.com> wrote: > >> Hi Dalin, >> >> That's great to hear. Have you also tried reading back those rows through >> Spark for a larger "batch processing" job? Am

Re: When would/should I use spark with phoenix?

2016-09-12 Thread Josh Mahonin
Hi Dalin, That's great to hear. Have you also tried reading back those rows through Spark for a larger "batch processing" job? Am curious if you have any experiences or insight there from operating on a large dataset. Thanks! Josh On Mon, Sep 12, 2016 at 10:29 AM, dalin.qin

Re: When would/should I use spark with phoenix?

2016-09-11 Thread Josh Mahonin
Just to add to James' comment, they're indeed complementary and it all comes down to your own use case. Phoenix offers a convenient SQL interface over HBase, which is capable of doing very fast queries. If you're just doing insert / retrieval, it's unlikely that Spark will help you much there.

Re: TableNotFoundException, tableName=SYSTEM.CATALOG with phoenix-spark

2016-08-10 Thread Josh Mahonin
HBase master because there is no > 4.7.0-HBase-1.2 set in MVN. Is the phoenix-spark functionality confirmed to > work in 4.7 against HBase 1.2? > > > On Tue, Aug 9, 2016 at 7:37 PM, Josh Mahonin <jmaho...@gmail.com> wrote: > >> Hi Nathan, >> >> That

Re: Phoenix spark and dynamic columns

2016-07-27 Thread Josh Mahonin
Hi Paul, Unfortunately out of the box the Spark integration doesn't support saving to dynamic columns. It's worth filing a JIRA enhancement over, and if you're interested in contributing a patch, here's the following spots I think would need enhancing: The saving code derives the column names to

Re: NoClassDefFoundError org/apache/hadoop/hbase/HBaseConfiguration

2016-07-06 Thread Josh Mahonin
doop.hive.ql.metadata.Hive.(Hive.java:166) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) > ... > ... 73 more > ... > :16: error: not found: value sqlContext > import sqlContext.implicits._ > ^ > :16:

Re: Phoenix-Spark: is DataFrame saving a single threaded operation?

2016-07-05 Thread Josh Mahonin
Hi Vamsi, The DataFrame has an underlying number of partitions associated with it, which will be processed by however many workers you have in your Spark cluster. You can check the number of partitions with: df.rdd.partitions.size And you can alter the partitions using:

Re: NoClassDefFoundError org/apache/hadoop/hbase/HBaseConfiguration

2016-07-05 Thread Josh Mahonin
Hi Robert, I recommend following up with HDP on this issue. The underlying problem is that the 'phoenix-spark-4.4.0.2.4.0.0-169.jar' they've provided isn't actually a fat client JAR, it's missing many of the required dependencies. They might be able to provide the correct JAR for you, but you'd

Re: phoenix spark options not supporint query in dbtable

2016-06-09 Thread Josh Mahonin
Hi Xindian, The phoenix-spark integration is based on the Phoenix MapReduce layer, which doesn't support aggregate functions. However, as you mentioned, both filtering and pruning predicates are pushed down to Phoenix. With an RDD or DataFrame loaded, all of Spark's various aggregation methods

Re: GenericMutableRow cannot be cast to org.apache.spark.sql.Row

2016-05-15 Thread Josh Mahonin
Hi Radha, I suggest you create a ticket with Hortonworks for this issue. I believe the root cause is that the version of Phoenix they have provided doesn't include all of the necessary patches for Spark 1.6 DataFrame support. Good luck, Josh On Thu, May 12, 2016 at 3:11 AM, Radha krishna

Re: Phoenix-Spark: Number of partitions in PhoenixRDD

2016-04-18 Thread Josh Mahonin
Hi Diego, The phoenix-spark RDD partition count is equal to the number of splits that the query planner returns. Adjusting the HBase region splits, table salting [1], as well as the guidepost width [2] should help with the parallelization here. Using 'EXPLAIN' for the generated query in sqlline

Re: Spark & Phoenix data load

2016-04-10 Thread Josh Mahonin
Hi Neelesh, The saveToPhoenix method uses the MapReduce PhoenixOutputFormat under the hood, which is a wrapper over the JDBC driver. It's likely not as efficient as the CSVBulkLoader, although there are performance improvements over a simple JDBC client as the writes are spread across multiple

Re: [HELP:]Save Spark Dataframe in Phoenix Table

2016-04-10 Thread Josh Mahonin
? > > Thanks, > Divya > > Josh Mahonin <jmahonin@ > > On 9 April 2016 at 23:01, Josh Mahonin <jmaho...@gmail.com> wrote: > >> Hi Divya, >> >> You don't have the phoenix client-spark JAR in your classpath, which is >> required for the phoenix-spar

Re: Spark Plugin Information

2016-04-10 Thread Josh Mahonin
amenode001.pnj3i.gradientx.com, > prod-nj3-namenode002.pnj3i.gradientx.com:2181” > > Is this correct? > > Thanks, > Ben > > > On Apr 9, 2016, at 8:06 AM, Josh Mahonin <jmaho...@gmail.com> wrote: > > Hi Ben, > > It looks like a connection URL issue. Are you pas

Re: Spark Plugin Information

2016-04-09 Thread Josh Mahonin
ssLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at jav

Re: [HELP:]Save Spark Dataframe in Phoenix Table

2016-04-09 Thread Josh Mahonin
-- Forwarded message -- > From: Divya Gehlot <divya.htco...@gmail.com> > Date: 8 April 2016 at 19:54 > Subject: Re: [HELP:]Save Spark Dataframe in Phoenix Table > To: Josh Mahonin <jmaho...@gmail.com> > > > Hi Josh, > I am doing in the same manner a

Re: [HELP:]Save Spark Dataframe in Phoenix Table

2016-04-08 Thread Josh Mahonin
Hi Divya, That's strange. Are you able to post a snippet of your code to look at? And are you sure that you're saving the dataframes as per the docs ( https://phoenix.apache.org/phoenix_spark.html)? Depending on your HDP version, it may or may not actually have phoenix-spark support.

Re: Phoenix DB Migration with Flyway

2016-03-08 Thread Josh Mahonin
word further? > > James > > > On Tuesday, March 8, 2016, Josh Mahonin <jmaho...@gmail.com> wrote: > >> Hi all, >> >> Just thought I'd let you know that Flyway 4.0 was recently released, >> which includes support for DB migrations with Phoenix. >> >> https://flywaydb.org/blog/flyway-4.0 >> >> Josh >> >

Phoenix DB Migration with Flyway

2016-03-08 Thread Josh Mahonin
Hi all, Just thought I'd let you know that Flyway 4.0 was recently released, which includes support for DB migrations with Phoenix. https://flywaydb.org/blog/flyway-4.0 Josh

Re: Spark Phoenix Plugin

2016-02-20 Thread Josh Mahonin
; > I get this error: > > java.lang.IllegalStateException: unread block data > > Thanks, > Ben > > > On Feb 19, 2016, at 11:12 AM, Josh Mahonin <jmaho...@gmail.com> wrote: > > What specifically doesn't work for you? > > I have a Docker image that I us

Re: Spark Phoenix Plugin

2016-02-19 Thread Josh Mahonin
t;> >>> >>> >>> On Feb 8, 2016, at 2:57 PM, pierre lacave <pie...@lacave.me> wrote: >>> >>> This is the wrong client jar try with the one named >>> phoenix-4.7.0-HBase-1.1-client-spark.jar >>> >>> On Mon, 8 Feb 2016, 22:29 Benjamin

Re: Save dataframe to Phoenix

2016-02-17 Thread Josh Mahonin
Hi Krishna, There was some talk a few weeks ago about a new feature to allow creating / saving to tables dynamically, with schema inferred from the DataFrame. However, I don't believe a JIRA has been filed for it yet. As always, pull requests are appreciated. Josh On Tue, Feb 16, 2016 at 6:16

Re: Spark Phoenix Plugin

2016-02-08 Thread Josh Mahonin
Hi Ben, I'm not sure about the format of those command line options you're passing. I've had success with spark-shell just by setting the 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath' options on the spark config, as per the docs [1]. I'm not sure if there's anything special

Re: Phoenix and Tableau

2016-01-28 Thread Josh Mahonin
Hey Thomas, That's pretty neat if I read that right. You're able to use Tableau with Phoenix using the Phoenix-Spark integration? Thanks! Josh On Thu, Jan 28, 2016 at 2:31 PM, Thomas Decaux wrote: > Yeah me too :/ i tried Spark , it works fine with Tableau on Mac. > >

Re: phoenix-spark and pyspark

2016-01-24 Thread Josh Mahonin
t;> FYI, Nick - do you know about Josh's fix for PHOENIX-2599? Does that help >> here? >> >> On Fri, Jan 22, 2016 at 4:32 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: >> >>> On Thu, Jan 21, 2016 at 7:36 AM, Josh Mahonin <jmaho...@gmail.com> >>&g

Re: phoenix-spark and pyspark

2016-01-21 Thread Josh Mahonin
atException.forInputString(NumberFormatException.java:65) >> at java.lang.Long.parseLong(Long.java:589) >> at java.lang.Long.parseLong(Long.java:631) >> at >> org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137) >> at >> o

Re: phoenix-spark and pyspark

2016-01-20 Thread Josh Mahonin
> dataframe access is now working. Let me see about updating the docs page to > be more clear, I'll send a patch by you for review. > > Thanks a lot for the help! > -n > > On Tue, Jan 19, 2016 at 5:59 PM, Josh Mahonin <jmaho...@gmail.com> wrote: > >> Right, this clus

Re: phoenix-spark and pyspark

2016-01-19 Thread Josh Mahonin
to DriverManager actually go through Spark's weird wrapper version of it. On Tue, Jan 19, 2016 at 7:36 PM, Nick Dimiduk <ndimi...@apache.org> wrote: > On Tue, Jan 19, 2016 at 4:17 PM, Josh Mahonin <jmaho...@gmail.com> wrote: > >> What version of Spark are you using? >>

Re: phoenix-spark and pyspark

2016-01-19 Thread Josh Mahonin
d by the YARN runtime? > > On Tue, Jan 19, 2016 at 5:07 PM, Josh Mahonin <jmaho...@gmail.com> wrote: > >> Sadly, it needs to be installed onto each Spark worker (for now). The >> executor config tells each Spark worker to look for that file to add to its >> classpath, so once

Re: StaleRegionBoundaryCacheException with Phoenix 4.5.1-HBase 1.0 and Spark 1.4.1

2016-01-16 Thread Josh Mahonin
e: > >> With the fix of PHOENIX-2447 >> <https://issues.apache.org/jira/browse/PHOENIX-2447> in 4.7.0, now >> Phoenix will try multiple times for StaleRegionBoundaryCacheException until >> query timeout. >> >> Alicia >> >> From: Josh Mahonin

Re: StaleRegionBoundaryCacheException with Phoenix 4.5.1-HBase 1.0 and Spark 1.4.1

2016-01-14 Thread Josh Mahonin
Hi Li, I've not seen this error myself, though some searching returns a possible root cause: http://mail-archives.apache.org/mod_mbox/incubator-phoenix-user/201507.mbox/%3CCAAF1JdjNW98dAnxf3kx=ndkswyorpt1redb9dqwjbhvvlfn...@mail.gmail.com%3E Could you file a JIRA ticket for this please? It's

Re: Re: error when get data from Phoenix 4.5.2 on CDH 5.5.x by spark 1.5

2016-01-08 Thread Josh Mahonin
iffrent primary key which should be > (B,C). The data is big ,about 3T . Could you tell me how to do it > fastly.I have tried to do this by spark, but it seems not fast. > > > > Best wishes for you! > > > > > > -- &

Re: Re: error when get data from Phoenix 4.5.2 on CDH 5.5.x by spark 1.5

2016-01-06 Thread Josh Mahonin
bc.PhoenixDriver"); > val pred = s"RID like 'wtwb2%' and TIME between 1325440922 and 1336440922" > val rdd = sc.phoenixTableAsRDD( > "AIS_AREA", > Seq("MMSI","LON","LAT","RID"), > predicate = Some(pred),

Re: Re: error when get data from Phoenix 4.5.2 on CDH 5.5.x by spark 1.5

2015-12-30 Thread Josh Mahonin
park/libs/phoenix-4.6.0-HBase-1.1-client.jar, \ > > /data/public/spark/libs/phoenix-core-4.6.0-HBase-1.1.jar, \ > > /data/public/spark/libs/phoenix-spark-4.6.0-HBase-1.1.jar > > > Thank you > -- > sac...@outlook.com > > > *From:* Josh Mahonin

Re: error when get data from Phoenix 4.5.2 on CDH 5.5.x by spark 1.5

2015-12-29 Thread Josh Mahonin
Hi, This issue is fixed with the following patch, and using the resulting 'client-spark' JAR after compilation: https://issues.apache.org/jira/browse/PHOENIX-2503 As an alternative, you may have some luck also including updated com.fasterxml.jackson jackson-databind JARs in your app that are in

Re: Does phoenix spark support arbitrary SELECT statement?

2015-12-15 Thread Josh Mahonin
Hi Li, When using the DataFrame integration, it supports arbitrary SELECT statements. Column pruning and predicate filtering is pushed down to Phoenix, and aggregate functions are executed within Spark. When using RDDs directly, you can specify a table name, columns and an optional WHERE

Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

2015-12-10 Thread Josh Mahonin
;ja...@sandia.gov> wrote: > Josh, > > I added all of those JARs separately to Spark's class paths, and it seems > to be working fine now. > > Thanks a lot for your help! > > Sent from my iPhone > > On Dec 9, 2015, at 2:30 PM, Josh Mahonin <jmaho...@gmail.com> wrot

Re: phoenix-spark and pyspark

2015-12-10 Thread Josh Mahonin
Hey Nick, I think this used to work, and will again once PHOENIX-2503 gets resolved. With the Spark DataFrame support, all the necessary glue is there for Phoenix and pyspark to play nice. With that client JAR (or by overriding the com.fasterxml.jackson JARS), you can do something like: df =

Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

2015-12-09 Thread Josh Mahonin
oop. > > When I downloaded the Spark build with user provided Hadoop, and also > installed Hadoop manually, Spark works with Phoenix correctly! > > Thank you much, > Jonathan > > Sent from my iPhone > > On Dec 8, 2015, at 8:54 PM, Josh Mahonin <jmaho...@gmail.com> wrot

Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

2015-12-09 Thread Josh Mahonin
: > https://issues.apache.org/jira/browse/PHOENIX-2503 > > > > Multiple Java NoClass/Method Errors with Spark and Phoenix > > > > *From:* Josh Mahonin [mailto:jmaho...@gmail.com] > *Sent:* Wednesday, December 09, 2015 1:15 PM > > *To:* user@phoenix.apache.org > *Subject:*

Re: [EXTERNAL] Re: Confusion Installing Phoenix Spark Plugin / Various Errors

2015-12-09 Thread Josh Mahonin
t;phoenix-server:2181") > > | ) > > warning: there were 1 deprecation warning(s); re-run with -deprecation for > details > > java.lang.NoSuchMethodError: > com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class; > >

Re: Confusion Installing Phoenix Spark Plugin / Various Errors

2015-12-08 Thread Josh Mahonin
Hi Jonathan, Spark only needs the client JAR. It contains all the other Phoenix dependencies as well. I'm not sure exactly what the issue you're seeing is. I just downloaded and extracted fresh copies of Spark 1.5.2 (pre-built with user-provided Hadoop), and the latest Phoenix 4.6.0 binary

Re: Problem with arrays in phoenix-spark

2015-11-30 Thread Josh Mahonin
Hi David, Thanks for the bug report and the proposed patch. Please file a JIRA and we'll take the discussion there. Josh On Mon, Nov 30, 2015 at 1:01 PM, Dawid Wysakowicz < wysakowicz.da...@gmail.com> wrote: > Hi, > > I've recently found some behaviour that I found buggy when working with >

RE: Phoenix-spark : NoClassDefFoundError: HBaseConfiguration

2015-11-06 Thread Josh Mahonin
Have you tried setting the SPARK_CLASSPATH or spark.driver.extraClassPath / spark.executor.extraClassPath as specified here? https://phoenix.apache.org/phoenix_spark.html Spark treats the JARs passed in with '--jars', and the class path, a little differently.

Re: integration Phoenix and Spark

2015-09-29 Thread Josh Mahonin
Make sure to double check your imports. Note the following from https://phoenix.apache.org/phoenix_spark.html import org.apache.spark.SparkContext import org.apache.spark.sql.SQLContext import org.apache.phoenix.spark._ There's also a sample repository here:

Re: Spark Plugin Exception - java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row

2015-09-23 Thread Josh Mahonin
Hi Babar, Can you file a JIRA for this? I suspect this is something to do with the Spark 1.5 data frame API data structures, perhaps they've gone and changed them again! Can you try with previous Spark versions to see if there's a difference? Also, you may have luck interfacing with the RDDs

Re: Spark Plugin Exception - java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row

2015-09-23 Thread Josh Mahonin
tps://issues.apache.org/jira/browse/PHOENIX-2287> for this. And the code works fine with Spark 1.4.1. Thanks On Wed, Sep 23, 2015 at 6:06 AM Josh Mahonin <jmaho...@interset.com<mailto:jmaho...@interset.com>> wrote: Hi Babar, Can you file a JIRA for this? I suspect this is somet

Re: setting up community repo of Phoenix for CDH5?

2015-09-14 Thread Josh Mahonin
On Mon, Sep 14, 2015 at 9:21 AM, James Heather wrote: > I'm not certain of the best way to manage this. Perhaps we need a new > mailing list for those who want to help, to avoid cluttering this list up. Just my opinion, but maybe a tag in the email subject,

Re: JOIN issue, getting errors

2015-09-09 Thread Josh Mahonin
This looks suspiciously like https://issues.apache.org/jira/browse/PHOENIX-2169 Are you able to perform the same queries when the tables aren't salted? Also, what versions of HBase / Phoenix are you using? On Tue, Sep 8, 2015 at 12:33 PM, M. Aaron Bossert wrote: > All, > >

Re: Get values that caused the exception

2015-09-03 Thread Josh Mahonin
Hi Yiannis, I've found the best solution to this is generally just to add logging around that area. For example, you could add a try (or Scala Try<>) and check if an exception has been thrown, then log it somewhere. As a wild guess, if you're dealing with a Double datatype and getting

Re: Getting ArrayIndexOutOfBoundsException in UPSERT SELECT

2015-08-26 Thread Josh Mahonin
Hi Jaime, I've run into similar issues with Phoenix 4.2.x. They seem to have gone away for me as of 4.3.1. Are you able to upgrade to that version or higher and try those same queries? Josh On Wed, Aug 26, 2015 at 3:30 PM, Jaime Solano jdjsol...@gmail.com wrote: Hi Yiannis, Not quite, but

Re: ERROR 201 (22000): Illegal data on Upsert Select

2015-08-20 Thread Josh Mahonin
Tracking in https://issues.apache.org/jira/browse/PHOENIX-2169 On Thu, Aug 20, 2015 at 5:14 PM, Samarth Jain sama...@apache.org wrote: Yiannis, Can you please provide a reproducible test case (schema, minimum data to reproduce the error) along with the phoenix and hbase versions so we can

Re: REG: Using Sequences in Phoenix Data Frame

2015-08-17 Thread Josh Mahonin
are supported by MR integration, but I'm not sure if their usage by the Spark integration would cause any issues. On Monday, August 17, 2015, Josh Mahonin jmaho...@interset.com wrote: Hi Satya, I don't believe sequences are supported by the broader Phoenix map-reduce integration, which

Re: Exception trying to write an ARRAY of UNSIGNED_SMALLINT

2015-08-04 Thread Josh Mahonin
Hi Riccardo, I think you've run into a bit of a mismatch between Scala and Java types. Could you please file a JIRA ticket for this with all the info above? You should be able to work around this by first converting your array contents to be java.lang.Short. I just tried this out and it worked

Re: Using phoenix-spark plugin to insert an ARRAY Type

2015-07-30 Thread Josh Mahonin
Hi Riccardo, For saving arrays, you can use the plain old scala Array type. You can see the tests for an example: https://github.com/apache/phoenix/blob/master/phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L408-L427 Note that saving arrays is only supported in Phoenix

Re: HBase's checkAndPut, Timestamp in Phoenix-Spark API

2015-07-18 Thread Josh Mahonin
Hi, The phoenix-spark integration is a thin wrapper around the phoenix-mapreduce integration, which under the hood just uses Phoenix's 'UPSERT' functionality for saving. As far as I know, there's no provisions for checkAndPut functionality there, so if you require it, I suggest sticking to the

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-09 Thread Josh Mahonin
This may or may not be helpful for your classpath issues, but I wanted to verify that basic functionality worked, so I made a sample app here: https://github.com/jmahonin/spark-streaming-phoenix This consumes events off a Kafka topic using spark streaming, and writes out event counts to Phoenix

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-08 Thread Josh Mahonin
Hi Jeroen, Have you tried using the phoenix-client uber JAR in the Spark classpath? That strategy I think is the simplest and most straight-forward, although it may not be appropriate for all projects. With your setup though, my guess is that Spark is preferring to use its own versions of Hadoop

Re: Persisting Objects thru Phoenix

2015-03-23 Thread Josh Mahonin
Hi Anirudha, We're presently using Phoenix with the Dropwizard framework, using JDBI: https://dropwizard.github.io/dropwizard/manual/jdbi.html As well, I did a small trial run with the Play 2 Framework using both Anorm and Ebean, which were successful. We ended up choosing Dropwizard instead

Re: Fwd: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString

2015-03-18 Thread Josh Mahonin
Have you tried adding hbase-protocol to the SPARK_CLASSPATH? That worked for me to get Spark playing nicely with Phoenix. On Tue, Mar 17, 2015 at 6:15 PM, Andrew Purtell apurt...@apache.org wrote: This is HBASE-8 (https://issues.apache.org/jira/browse/HBASE-8). Looks like someone else

Flyway DB Migrations

2015-01-14 Thread Josh Mahonin
Hi all, As an FYI, I've got a pull request into Flyway (http://flywaydb.org/) for Phoenix support: https://github.com/flyway/flyway/pull/930 I don't know what everyone else is using for schema management, if anything at all, but the preliminary support works well enough for Flyway's various

Re: Flyway DB Migrations

2015-01-14 Thread Josh Mahonin
On Wed, Jan 14, 2015 at 1:41 PM, Josh Mahonin jmaho...@interset.com wrote: Hi all, As an FYI, I've got a pull request into Flyway (http://flywaydb.org/) for Phoenix support: https://github.com/flyway/flyway/pull/930 I don't know what everyone else is using for schema management

Re: Re: Fwd: Phoenix in production

2015-01-08 Thread Josh Mahonin
On Wed, Jan 7, 2015 at 1:43 PM, anil gupta anilgupt...@gmail.com wrote: Yup, I am aware of Spark HBase integration. Phoenix-Spark integration would be more sweet. :) Hi Anil, I'm using Spark and Phoenix in production fairly successfully. There's very little required for integration, since

Problem with UPSERT SELECT with CHAR field

2014-08-18 Thread Josh Mahonin
Hi all, I'm having problems creating a join table when one of the fields involved is a CHAR. I have a reproducible test case below: -- Create source table CREATE TABLE IF NOT EXISTS SOURCE_TABLE( TID CHAR(3) NOT NULL, A UNSIGNED_INT NOT NULL, B UNSIGNED_INT NOT NULL CONSTRAINT pk PRIMARY