Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
1.0 root dir and add the following to root pom.xml: > hbase-spark > > Then you would be able to build the module yourself. > > hbase-spark module uses APIs which are compatible with hbase 1.0 > > Cheers > > On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Hi Ted, I see that you’re working on the hbase-spark module for hbase. I recently packaged the SparkOnHBase project and gave it a test run. It works like a charm on CDH 5.4 and 5.5. All I had to do was add /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the classpath.txt

Re: S3 Zip File Loading Advice

2016-03-09 Thread Benjamin Kim
ple files in each zip? Single file archives are processed just > like text as long as it is one of the supported compression formats. > > Regards > Sab > > On Wed, Mar 9, 2016 at 10:33 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: >

S3 Zip File Loading Advice

2016-03-08 Thread Benjamin Kim
I am wondering if anyone can help. Our company stores zipped CSV files in S3, which has been a big headache from the start. I was wondering if anyone has created a way to iterate through several subdirectories (s3n://events/2016/03/01/00, s3n://2016/03/01/01, etc.) in S3 to find the newest

Re: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter

2016-03-07 Thread Benjamin Kim
To comment… At my company, we have not gotten it to work in any other mode than local. If we try any of the yarn modes, it fails with a “file does not exist” error when trying to locate the executable jar. I mentioned this to the Hue users group, which we used for this, and they replied that

Re: SFTP Compressed CSV into Dataframe

2016-03-03 Thread Benjamin Kim
<sw...@snappydata.io> wrote: > > (-user) > > On Thursday 03 March 2016 10:09 PM, Benjamin Kim wrote: >> I forgot to mention that we will be scheduling this job using Oozie. So, we >> will not be able to know which worker node is going to being running this. >> If we try

Re: Building a REST Service with Spark back-end

2016-03-02 Thread Benjamin Kim
I want to ask about something related to this. Does anyone know if there is or will be a command line equivalent of spark-shell client for Livy Spark Server or any other Spark Job Server? The reason that I am asking spark-shell does not handle multiple users on the same server well. Since a

SFTP Compressed CSV into Dataframe

2016-03-02 Thread Benjamin Kim
I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV file? I am able to download the file first locally using the SFTP Client in the spark-sftp package. Then, I load the file into a dataframe using the spark-csv package, which automatically decompresses the file. I just

Re: SparkOnHBase : Which version of Spark its available

2016-02-17 Thread Benjamin Kim
Ted, Any idea as to when this will be released? Thanks, Ben > On Feb 17, 2016, at 2:53 PM, Ted Yu wrote: > > The HBASE JIRA below is for HBase 2.0 > > HBase Spark module would be back ported to hbase 1.3.0 > > FYI > > On Feb 17, 2016, at 1:13 PM, Chandeep Singh

Re: spark 1.6.0 connect to hive metastore

2016-02-09 Thread Benjamin Kim
I got the same problem when I added the Phoenix plugin jar in the driver and executor extra classpaths. Do you have those set too? > On Feb 9, 2016, at 1:12 PM, Koert Kuipers wrote: > > yes its not using derby i think: i can see the tables in my actual hive > metastore. >

Re: Is there a any plan to develop SPARK with c++??

2016-02-03 Thread Benjamin Kim
Hi DaeJin, The closest thing I can think of is this. https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html Cheers, Ben > On Feb 3, 2016, at 9:49 PM, DaeJin Jung wrote: > > hello everyone, > I have a short question. > > I would

Re: Spark with SAS

2016-02-03 Thread Benjamin Kim
You can download the Spark ODBC Driver. https://databricks.com/spark/odbc-driver-download > On Feb 3, 2016, at 10:09 AM, Jörn Franke wrote: > > This could be done through odbc. Keep in mind that you can run SaS jobs > directly on a Hadoop cluster using the SaS embedded

Re: [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-02 Thread Benjamin Kim
Hi David, My company uses Lamba to do simple data moving and processing using python scripts. I can see using Spark instead for the data processing would make it into a real production level platform. Does this pave the way into replacing the need of a pre-instantiated cluster in AWS or bought

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-26 Thread Benjamin Kim
SPATH for this purpose, > but I couldn't get this to work for whatever reason, so i'm sticking to the > --jars approach used in my examples. > > On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Stephen, > >

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-25 Thread Benjamin Kim
Spark Standalone per the spark.worker.cleanup.appDataTtl config param. > > The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, > but I couldn't get this to work for whatever reason, so i'm sticking to the > --jars approach used in my examples. >

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Benjamin Kim
Hi Stephen, I forgot to mention that I added these lines below to the spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I restarted it. spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Benjamin Kim
rk. > > 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuil...@gmail.com > <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>>: > >> Hi Stephen, >> >> I forgot to mention that I added these lines below to the >> spark-default.conf on the node with Spark

<    1   2