[jira] [Commented] (YARN-6214) NullPointer Exception while querying timeline server API

2020-03-10 Thread Benjamin Kim (Jira)
[ https://issues.apache.org/jira/browse/YARN-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056315#comment-17056315 ] Benjamin Kim commented on YARN-6214: The root cause if one of the apps is in init status, some

[jira] [Commented] (YARN-6214) NullPointer Exception while querying timeline server API

2020-02-27 Thread Benjamin Kim (Jira)
[ https://issues.apache.org/jira/browse/YARN-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047125#comment-17047125 ] Benjamin Kim commented on YARN-6214: It happened to me,   {code:java} {"exce

[spyder] Spyder 3.3.1 in Anacanda Navigator 1.8.7 Autocomplete and Online Help are not working

2018-08-15 Thread Benjamin Kim
I just started a Data Science class where they use Spyder as the IDE. After installing the latest Anaconda on my Macbook Pro with High Sierra and updating Spyder to 3.3.1, I got Spyder to launch fine. But, when I try to get information about objects and methods (cmd-i), nothing comes up. Also,

Re: Append In-Place to S3

2018-06-07 Thread Benjamin Kim
ted correctly, if you're joining then overwrite otherwise only > append as it removes dups. > > I think, in this scenario, just change it to write.mode('overwrite') because > you're already reading the old data and your job would be done. > > > On Sat 2 Jun, 2018, 10:27 PM Be

Re: Zeppelin 0.8

2018-06-07 Thread Benjamin Kim
Can anyone tell me what the status is for 0.8 release? > On May 2, 2018, at 4:43 PM, Jeff Zhang wrote: > > > Yes, 0.8 will support spark 2.3 > > Benjamin Kim mailto:bbuil...@gmail.com>>于2018年5月3日周四 > 上午1:59写道: > Will Zeppelin 0.8 have Spark 2.3 support? >

Re: Credentials for JDBC

2018-06-07 Thread Benjamin Kim
Hi 종열, Can you show me how? Thanks, Ben > On Jun 6, 2018, at 10:32 PM, Jongyoul Lee wrote: > > We have a trick to get credential information from a credential page. I'll > take into it. > > On Thu, Jun 7, 2018 at 7:53 AM, Benjamin Kim <mailto:bbuil...@gmail.com>>

Credentials for JDBC

2018-06-06 Thread Benjamin Kim
I created a JDBC interpreter for AWS Athena, and it passes the access key as UID and secret key as PWD in the URL connection string. Does anyone know if I can setup each user to pass their own credentials in a, sort of, credentials file or config? Thanks, Ben

Re: Append In-Place to S3

2018-06-02 Thread Benjamin Kim
: > Benjamin, > > The append will append the "new" data to the existing data with removing > the duplicates. You would need to overwrite the file everytime if you need > unique values. > > Thanks, > Jayadeep > > On Fri, Jun 1, 2018 at 9:31 PM Benjamin Kim wrote

Append In-Place to S3

2018-06-01 Thread Benjamin Kim
I have a situation where I trying to add only new rows to an existing data set that lives in S3 as gzipped parquet files, looping and appending for each hour of the day. First, I create a DF from the existing data, then I use a query to create another DF with the data that is new. Here is the

Re: Zeppelin 0.8

2018-05-02 Thread Benjamin Kim
Will Zeppelin 0.8 have Spark 2.3 support? > On Apr 30, 2018, at 1:27 AM, Rotem Herzberg > wrote: > > Thanks > > On Mon, Apr 30, 2018 at 11:16 AM, Jeff Zhang > wrote: > > I am preparing the RC for 0.8 > > > Rotem

Re: Spark 2.2 Structured Streaming + Kinesis

2017-11-13 Thread Benjamin Kim
To add, we have a CDH 5.12 cluster with Spark 2.2 in our data center. On Mon, Nov 13, 2017 at 3:15 PM Benjamin Kim <bbuil...@gmail.com> wrote: > Does anyone know if there is a connector for AWS Kinesis that can be used > as a source for Structured Streaming? > > Thanks. > >

Databricks Serverless

2017-11-13 Thread Benjamin Kim
I have a question about this. The documentation compares the concept similar to BigQuery. Does this mean that we will no longer need to deal with instances and just pay for execution duration and amount of data processed? I’m just curious about how this will be priced. Also, when will it be ready

Spark 2.2 Structured Streaming + Kinesis

2017-11-13 Thread Benjamin Kim
Does anyone know if there is a connector for AWS Kinesis that can be used as a source for Structured Streaming? Thanks.

Serverless ETL

2017-10-17 Thread Benjamin Kim
With AWS having Glue and GCE having Dataprep, is Databricks coming out with an equivalent or better? I know that Serverless is a new offering, but will it go farther with automatic data schema discovery, profiling, metadata storage, change triggering, joining, transform suggestions, etc.? Just

DMP/CDP Profile Store

2017-08-30 Thread Benjamin Kim
I was wondering has anyone worked on a DMP/CDP for storing user and customer profiles in Kudu. Each user will have their base ID's aka identity graph along with statistics based on their attributes along with tables for these attributes grouped by category. Please let me know what you think of my

Re: Configure Impala for Kudu on Separate Cluster

2017-08-18 Thread Benjamin Kim
Todd, I'll keep this in mind. This information will be useful. I'll try again. Thanks, Ben On Wed, Aug 16, 2017 at 4:32 PM Todd Lipcon <t...@cloudera.com> wrote: > On Wed, Aug 16, 2017 at 6:16 AM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Hi, >> >&

Re: Configure Impala for Kudu on Separate Cluster

2017-08-15 Thread Benjamin Kim
t has been blocked (eg an iptables REJECT rule) > > -Todd > > On Mon, Aug 14, 2017 at 10:36 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Hi Todd, >> >> I tried to create a Kudu table using impala shell, and I got this error. >> >> c

Re: Configure Impala for Kudu on Separate Cluster

2017-08-15 Thread Benjamin Kim
; error message is admittedly pretty bad, but it basically means it's getting > "connection refused", indicating that either there is no master running on > that host or it has been blocked (eg an iptables REJECT rule) > > -Todd > > On Mon, Aug 14, 2017 at 10:36 PM, Benja

Re: Cloudera Spark 2.2

2017-08-04 Thread Benjamin Kim
wrote: > It was built. I think binaries are only available for official releases? > > > > -- > Ruslan Dautkhanov > > On Wed, Aug 2, 2017 at 4:41 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Did you build Zeppelin or download the binary? >> >>

Re: Cloudera Spark 2.2

2017-08-02 Thread Benjamin Kim
lan Dautkhanov > > On Wed, Aug 2, 2017 at 4:31 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Does this work with Zeppelin 0.7.1? We an error when setting SPARK_HOME >> in zeppelin-env.sh to what you have below. >> >> On Wed, Aug 2, 2017 at 3:24 PM Ruslan Dau

Re: Cloudera Spark 2.2

2017-08-02 Thread Benjamin Kim
'_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera1 > /_/ > > > spark-submit and spark-shell are just shell script wrappers. > > > > -- > Ruslan Dautkhanov > > On Wed, Aug 2, 2017 at 10:22 AM, Benjamin Kim <bbuil...@gmail.com> wrote: > >

Geo Map Charting

2017-08-02 Thread Benjamin Kim
Anyone every try to chart density clusters or heat maps onto a geo map of the earth in Zeppelin? Can it be done? Cheers, Ben

Re: Cloudera Spark 2.2

2017-08-02 Thread Benjamin Kim
recompile Zeppelin with Scala 2.11? > Also Spark 2.2 now requires JDK8 I believe. > > > > -- > Ruslan Dautkhanov > > On Tue, Aug 1, 2017 at 6:26 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Here is more. >> >> org.apache.zeppelin.interpreter.Interp

Re: Cloudera Spark 2.2

2017-08-01 Thread Benjamin Kim
il.com> wrote: > > Then it is due to some classpath issue. I am not sure familiar with CDH, > please check whether spark of CDH include hadoop jar with it. > > > Benjamin Kim <bbuil...@gmail.com>于2017年8月2日周三 上午8:22写道: > >> Here is the error that was sent to me. >>

Re: Cloudera Spark 2.2

2017-08-01 Thread Benjamin Kim
t;zjf...@gmail.com>于2017年8月2日周三 上午8:18写道: > >> >> What's the error you see in log ? >> >> >> Benjamin Kim <bbuil...@gmail.com>于2017年8月2日周三 上午8:18写道: >> >>> Has anyone configured Zeppelin 0.7.1 for Cloudera's release of Spark >>> 2.2?

Cloudera Spark 2.2

2017-08-01 Thread Benjamin Kim
Has anyone configured Zeppelin 0.7.1 for Cloudera's release of Spark 2.2? I can't get it to work. I downloaded the binary and set SPARK_HOME to /opt/cloudera/parcels/SPARK2/lib/spark2. I must be missing something. Cheers, Ben

Glue-like Functionality

2017-07-08 Thread Benjamin Kim
Has anyone seen AWS Glue? I was wondering if there is something similar going to be built into Spark Structured Streaming? I like the Data Catalog idea to store and track any data source/destination. It profiles the data to derive the scheme and data types. Also, it does some sort-of automated

Centos 7 Compatibility

2017-06-21 Thread Benjamin Kim
All, I’m curious to know if Zeppelin will work with CentOS 7. I don’t see it in the list of OS’s supported. Thanks, Ben

Re: Use SQL Script to Write Spark SQL Jobs

2017-06-12 Thread Benjamin Kim
Hi Bo, +1 for your project. I come from the world of data warehouses, ETL, and reporting analytics. There are many individuals who do not know or want to do any coding. They are content with ANSI SQL and stick to it. ETL workflows are also done without any coding using a drag-and-drop user

Re: Spark 2.1 and Hive Metastore

2017-04-09 Thread Benjamin Kim
ion you are asking about? > > - Dan > > On Sun, Apr 9, 2017 at 11:13 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > I’m curious about if and when Spark SQL will ever remove its dependency on > Hive Metastore. Now that Spark 2.1’

Spark 2.1 and Hive Metastore

2017-04-09 Thread Benjamin Kim
I’m curious about if and when Spark SQL will ever remove its dependency on Hive Metastore. Now that Spark 2.1’s SparkSession has superseded the need for HiveContext, are there plans for Spark to no longer use the Hive Metastore service with a “SparkSchema” service with a PostgreSQL, MySQL, etc.

Spark 2.1 and Hive Metastore

2017-04-09 Thread Benjamin Kim
I’m curious about if and when Spark SQL will ever remove its dependency on Hive Metastore. Now that Spark 2.1’s SparkSession has superseded the need for HiveContext, are there plans for Spark to no longer use the Hive Metastore service with a “SparkSchema” service with a PostgreSQL, MySQL, etc.

Re: Spark on Kudu Roadmap

2017-04-09 Thread Benjamin Kim
U-1676> ) so you may want to file a > JIRA to help track this feature. > > Mike > > > On Mon, Mar 27, 2017 at 11:55 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Mike, > > I believe what we are looking for is this be

Re: Spark on Kudu Roadmap

2017-03-27 Thread Benjamin Kim
wrote: > > Hi Ben, > Is there anything in particular you are looking for? > > Thanks, > Mike > > On Mon, Mar 27, 2017 at 9:48 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi, > > Are there any plans for deeper inte

Spark on Kudu Roadmap

2017-03-27 Thread Benjamin Kim
Hi, Are there any plans for deeper integration with Spark especially Spark SQL? Is there a roadmap to look at, so I can know what to expect in the future? Cheers, Ben

Re: Kudu on top of Alluxio

2017-03-25 Thread Benjamin Kim
> caching. Also I don't recall Tachyon providing POSIX semantics. > > Mike > > Sent from my iPhone > >> On Mar 25, 2017, at 9:50 AM, Benjamin Kim <bbuil...@gmail.com> wrote: >> >> Hi, >> >> Does anyone know of a way to use AWS S3 or >

Kudu on top of Alluxio

2017-03-25 Thread Benjamin Kim
Hi, Does anyone know of a way to use AWS S3 or

Security Roadmap

2017-03-18 Thread Benjamin Kim
I’m curious as to what security features we can expect coming in the near and far future for Kudu. If there is some documentation for this, please let me know. Cheers, Ben

Login/Logout Problem

2017-03-01 Thread Benjamin Kim
We are running into problems where users login and staying logged in. When they try to run JDBC queries or even opening a notebook, they get flickering in the browser where the green color dot next to the username turns red, then back to green, then back to red, etc. When it stops doing that,

Zeppelin Service Install

2017-03-01 Thread Benjamin Kim
Anyone have installed Zeppelin onto a CentOS/RedHat server and made it into a service? I can’t seem to find the instructions on how to do this. Cheers, Ben

Re: Get S3 Parquet File

2017-02-24 Thread Benjamin Kim
e you do not want to be writing code which needs you to update it once > again in 6 months because newer versions of SPARK now find it deprecated. > > > Regards, > Gourav Sengupta > > > > On Fri, Feb 24, 2017 at 7:18 AM, Benjamin Kim <bbuil...@gmail.com >

Re: Get S3 Parquet File

2017-02-23 Thread Benjamin Kim
.0. We are > waiting for the move to Spark 2.0/2.1. > > And besides that would you not want to work on a platform which is at least > 10 times faster What would that be? > > Regards, > Gourav Sengupta > > On Thu, Feb 23, 2017 at 6:23 PM, Benjamin Kim <bbuil...@gmail.com

Re: Get S3 Parquet File

2017-02-23 Thread Benjamin Kim
code and see if the issue resolves, then it can be > hidden and read from Input Params. > > Thanks, > Aakash. > > > On 23-Feb-2017 11:54 PM, "Benjamin Kim" <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > We are trying to use Spark 1.

Get S3 Parquet File

2017-02-23 Thread Benjamin Kim
We are trying to use Spark 1.6 within CDH 5.7.1 to retrieve a 1.3GB Parquet file from AWS S3. We can read the schema and show some data when the file is loaded into a DataFrame, but when we try to do some operations, such as count, we get this error below.

Re: Parquet Gzipped Files

2017-02-14 Thread Benjamin Kim
gt; wrote: > > Your vendor should use the parquet internal compression and not take a > parquet file and gzip it. > >> On 13 Feb 2017, at 18:48, Benjamin Kim <bbuil...@gmail.com> wrote: >> >> We are receiving files from an outside vendor who creates a Parqu

Parquet Gzipped Files

2017-02-13 Thread Benjamin Kim
We are receiving files from an outside vendor who creates a Parquet data file and Gzips it before delivery. Does anyone know how to Gunzip the file in Spark and inject the Parquet data into a DataFrame? I thought using sc.textFile or sc.wholeTextFiles would automatically Gunzip the file, but

Remove dependence on HDFS

2017-02-11 Thread Benjamin Kim
Has anyone got some advice on how to remove the reliance on HDFS for storing persistent data. We have an on-premise Spark cluster. It seems like a waste of resources to keep adding nodes because of a lack of storage space only. I would rather add more powerful nodes due to the lack of

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
t; > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Asher, > > I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java > (1.8) version as our installation. The Scala (2.10.5) vers

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Dverbose=true"? And did you see only scala 2.10.5 being pulled in? > > On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Asher, > > It’s still the same. Do you have any other ideas? > > Cheers, > Ben

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
ly, you might want to > check which version of the scala sdk your IDE is using > > Asher Krim > Senior Software Engineer > > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Asher, > > I modifie

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
o, if you're seeing this locally, you might want to > check which version of the scala sdk your IDE is using > > Asher Krim > Senior Software Engineer > > On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com> wrote: > > Hi Asher, > > I modified th

Re: HBase Spark

2017-02-02 Thread Benjamin Kim
her Krim <ak...@hubspot.com> wrote: > > Ben, > > That looks like a scala version mismatch. Have you checked your dep tree? > > Asher Krim > Senior Software Engineer > > > On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:

Re: HBase Spark

2017-02-02 Thread Benjamin Kim
ltSource.createRelation(HBaseRelation.scala:51) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) If you can please help, I would be grateful. Cheers, Ben >

Re: HBase Spark

2017-01-31 Thread Benjamin Kim
Elek, If I cannot use the HBase Spark module, then I’ll give it a try. Thanks, Ben > On Jan 31, 2017, at 1:02 PM, Marton, Elek <h...@anzix.net> wrote: > > > I tested this one with hbase 1.2.4: > > https://github.com/hortonworks-spark/shc > > Marton > >

HBase Spark

2017-01-31 Thread Benjamin Kim
Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I tried to build it from source, but I cannot get it to work. Thanks, Ben - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: PostgreSQL JDBC Connections

2017-01-05 Thread Benjamin Kim
k with dynamic forms in a more > meaningful way e.g. use SQL results to create a new drop down to drive the > next page etc… > > > >> On Jan 5, 2017, at 12:57 PM, Benjamin Kim <bbuil...@gmail.com> wrote: >> >> We are getting “out of shared memory” errors when

Re: Merging Parquet Files

2016-12-22 Thread Benjamin Kim
k/kite This might be useful. Thanks! 2016-12-23 7:01 GMT+09:00 Benjamin Kim <bbuil...@gmail.com>: Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them into 1 file after they are output from Spark. Doing a coalesce(1) on the Spark cluster will not work. It just d

Re: Merging Parquet Files

2016-12-22 Thread Benjamin Kim
s://issues.apache.org/jira/browse/PARQUET-460> > > It seems parquet-tools allows merge small Parquet files into one. > > > Also, I believe there are command-line tools in Kite - > https://github.com/kite-sdk/kite <https://github.com/kite-sdk/kite> > > This might

Merging Parquet Files

2016-12-22 Thread Benjamin Kim
Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them into 1 file after they are output from Spark. Doing a coalesce(1) on the Spark cluster will not work. It just does not have the resources to do it. I'm trying to do it using the commandline and not use Spark. I will

Re: Deep learning libraries for scala

2016-11-01 Thread Benjamin Kim
.@gmail.com> wrote: > > Agreed. But as it states deeper integration with (scala) is yet to be > developed. > Any thoughts on how to use tensorflow with scala ? Need to write wrappers I > think. > > > On Oct 19, 2016 7:56 AM, "Benjamin Kim" <bbuil...@gmail.com

Spark Streaming and Kinesis

2016-10-27 Thread Benjamin Kim
Has anyone worked with AWS Kinesis and retrieved data from it using Spark Streaming? I am having issues where it’s returning no data. I can connect to the Kinesis stream and describe using Spark. Is there something I’m missing? Are there specific IAM security settings needed? I just simply

Re: Deep learning libraries for scala

2016-10-19 Thread Benjamin Kim
On that note, here is an article that Databricks made regarding using Tensorflow in conjunction with Spark. https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html Cheers, Ben > On Oct 19, 2016, at 3:09 AM, Gourav Sengupta >

JDBC Connections

2016-10-18 Thread Benjamin Kim
We are using Zeppelin 0.6.0 as a self-service for our clients to query our PostgreSQL databases. We are noticing that the connections are not closing after each one of them are done. What is the normal operating procedure to have these connections close when idle? Our scope for the JDBC

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Benjamin Kim
lta load data in spark > table cache and expose it through the thriftserver. But you have to implement > the loading logic, it can be very simple to very complex depending on your > needs. > > > 2016-10-17 19:48 GMT+02:00 Benjamin Kim <bbuil...@gmail.com > <mailto:bb

Re: Spark SQL Thriftserver with HBase

2016-10-17 Thread Benjamin Kim
y. > > With respect to Tableau… their entire interface in to the big data world > revolves around the JDBC/ODBC interface. So if you don’t have that piece as > part of your solution, you’re DOA w respect to Tableau. > > Have you considered Drill as your JDBC connecti

Re: Inserting New Primary Keys

2016-10-10 Thread Benjamin Kim
gt; wrote: > > Is there only one process adding rows? because this seems a little risky if > you have multiple threads doing that… > >> On Oct 8, 2016, at 1:43 PM, Benjamin Kim <bbuil...@gmail.com >> <mailto:bbuil...@gmail.com>> wrote: >> >> Mich,

Inserting New Primary Keys

2016-10-08 Thread Benjamin Kim
I have a table with data already in it that has primary keys generated by the function monotonicallyIncreasingId. Now, I want to insert more data into it with primary keys that will auto-increment from where the existing data left off. How would I do this? There is no argument I can pass into

Re: Kudu Command Line Client

2016-10-07 Thread Benjamin Kim
Todd, That works. Thanks, Ben > On Oct 7, 2016, at 5:03 PM, Todd Lipcon <t...@cloudera.com> wrote: > > On Fri, Oct 7, 2016 at 5:01 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Todd, > > I’m trying to use: > > kudu

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Benjamin Kim
Mich, I know up until CDH 5.4 we had to add the HTrace jar to the classpath to make it work using the command below. But after upgrading to CDH 5.7, it became unnecessary. echo "/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar" >> /etc/spark/conf/classpath.txt Hope this helps.

Re: Spark on Kudu

2016-09-20 Thread Benjamin Kim
; On Sep 20, 2016, at 1:44 PM, Todd Lipcon <t...@cloudera.com >> <mailto:t...@cloudera.com>> wrote: >> >> On Tue, Sep 20, 2016 at 1:18 PM, Benjamin Kim <bbuil...@gmail.com >> <mailto:bbuil...@gmail.com>> wrote: >> Now that Kudu 1.0.0 is officially out and r

Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Benjamin Kim
Todd, Thanks. I’ll look into those. Cheers, Ben > On Sep 20, 2016, at 12:11 AM, Todd Lipcon wrote: > > The Apache Kudu team is happy to announce the release of Kudu 1.0.0! > > Kudu is an open source storage engine for structured data which supports > low-latency random

Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Benjamin Kim
Todd, Thanks. I’ll look into those. Cheers, Ben > On Sep 20, 2016, at 12:11 AM, Todd Lipcon wrote: > > The Apache Kudu team is happy to announce the release of Kudu 1.0.0! > > Kudu is an open source storage engine for structured data which supports > low-latency random

Re: [ANNOUNCE] Apache Kudu 1.0.0 release

2016-09-20 Thread Benjamin Kim
This is awesome!!! Great!!! Do you know if any improvements were also made to the Spark plugin jar? Thanks, Ben > On Sep 20, 2016, at 12:11 AM, Todd Lipcon wrote: > > The Apache Kudu team is happy to announce the release of Kudu 1.0.0! > > Kudu is an open source storage

Re: JDBC Very Slow

2016-09-16 Thread Benjamin Kim
. Thanks, Ben > On Sep 16, 2016, at 3:29 PM, Nikolay Zhebet <phpap...@gmail.com> wrote: > > Hi! Can you split init code with current comand? I thing it is main problem > in your code. > > 16 сент. 2016 г. 8:26 PM пользователь "Benjamin Kim" <bbuil...@gm

JDBC Very Slow

2016-09-16 Thread Benjamin Kim
Has anyone using Spark 1.6.2 encountered very slow responses from pulling data from PostgreSQL using JDBC? I can get to the table and see the schema, but when I do a show, it takes very long or keeps timing out. The code is simple. val jdbcDF = sqlContext.read.format("jdbc").options(

Re: Using Spark SQL to Create JDBC Tables

2016-09-13 Thread Benjamin Kim
xposing data ie create hive > tables which "point to" any other DB. i know Oracle provides there own Serde > for hive. Not sure about PG though. > > Once tables are created in hive, STS will automatically see it. > > On Wed, Sep 14, 2016 at 11:08 AM, Benjam

Using Spark SQL to Create JDBC Tables

2016-09-13 Thread Benjamin Kim
Has anyone created tables using Spark SQL that directly connect to a JDBC data source such as PostgreSQL? I would like to use Spark SQL Thriftserver to access and query remote PostgreSQL tables. In this way, we can centralize data access to Spark SQL tables along with PostgreSQL making it very

Re: Spark SQL Thriftserver

2016-09-13 Thread Benjamin Kim
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary dama

Spark SQL Thriftserver

2016-09-13 Thread Benjamin Kim
Does anyone have any thoughts about using Spark SQL Thriftserver in Spark 1.6.2 instead of HiveServer2? We are considering abandoning HiveServer2 for it. Some advice and gotcha’s would be nice to know. Thanks, Ben - To

Re: Spark Metrics: custom source/sink configurations not getting recognized

2016-09-07 Thread Benjamin Kim
We use Graphite/Grafana for custom metrics. We found Spark’s metrics not to be customizable. So, we write directly using Graphite’s API, which was very easy to do using Java’s socket library in Scala. It works great for us, and we are going one step further using Sensu to alert us if there is

Re: Spark SQL Tables on top of HBase Tables

2016-09-03 Thread Benjamin Kim
of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 3 September 2016 at 20:31, Benjamin

Re: Spark SQL Tables on top of HBase Tables

2016-09-03 Thread Benjamin Kim
2 September 2016 at 23:08, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com > <mailto:mdkhajaasm...@gmail.com>> wrote: > Hi Kim, > > I am also looking for same information. Just got the same requirement today. > > Thanks, > Asmath > > On Fri, Sep 2, 2016

Spark SQL Tables on top of HBase Tables

2016-09-02 Thread Benjamin Kim
I was wondering if anyone has tried to create Spark SQL tables on top of HBase tables so that data in HBase can be accessed using Spark Thriftserver with SQL statements? This is similar what can be done using Hive. Thanks, Ben

Spark 1.6 Streaming with Checkpointing

2016-08-26 Thread Benjamin Kim
I am trying to implement checkpointing in my streaming application but I am getting a not serializable error. Has anyone encountered this? I am deploying this job in YARN clustered mode. Here is a snippet of the main parts of the code. object S3EventIngestion { //create and setup streaming

HBase-Spark Module

2016-07-29 Thread Benjamin Kim
I would like to know if anyone has tried using the hbase-spark module? I tried to follow the examples in conjunction with CDH 5.8.0. I cannot find the HBaseTableCatalog class in the module or in any of the Spark jars. Can someone help? Thanks, Ben

Re: Pass Credentials through JDBC

2016-07-28 Thread Benjamin Kim
this help, > Jongyoul > > On Fri, Jul 29, 2016 at 12:08 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Jonyoul, > > How would I enter credentials with the current version of Zeppelin? Do you > know of a way to make it work no

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Benjamin Kim
It is included in Cloudera’s CDH 5.8. > On Jul 22, 2016, at 6:13 PM, Mail.com wrote: > > Hbase Spark module will be available with Hbase 2.0. Is that out yet? > >> On Jul 22, 2016, at 8:50 PM, Def_Os wrote: >> >> So it appears it should be possible

Re: transtition SQLContext to SparkSession

2016-07-18 Thread Benjamin Kim
From what I read, there is no more Contexts. "SparkContext, SQLContext, HiveContext merged into SparkSession" I have not tested it, but I don’t know if it’s true. Cheers, Ben > On Jul 18, 2016, at 8:37 AM, Koert Kuipers wrote: > > in my codebase i would like to

Re: Performance Question

2016-07-18 Thread Benjamin Kim
<t...@cloudera.com> wrote: > > On Mon, Jul 18, 2016 at 10:31 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Todd, > > Thanks for the info. I was going to upgrade after the testing, but now, it > looks like I will have to do it earlier

Re: Performance Question

2016-07-18 Thread Benjamin Kim
t the server. It will > recreate a new repaired replica automatically. > > -Todd > > On Mon, Jul 18, 2016 at 10:28 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > During my re-population of the Kudu table, I am getting this erro

Re: Performance Question

2016-07-18 Thread Benjamin Kim
(unknown) @ 0x344d41ed5d (unknown) @ 0x7811d1 (unknown) Does anyone know what this means? Thanks, Ben > On Jul 11, 2016, at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote: > > On Mon, Jul 11, 2016 at 10:40 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...

Re: Spark Website

2016-07-13 Thread Benjamin Kim
It takes me to the directories instead of the webpage. > On Jul 13, 2016, at 11:45 AM, manish ranjan <cse1.man...@gmail.com> wrote: > > working for me. What do you mean 'as supposed to'? > > ~Manish > > > > On Wed, Jul 13, 2016 at 11:45 AM, Benjamin Kim <

Spark Website

2016-07-13 Thread Benjamin Kim
Has anyone noticed that the spark.apache.org is not working as supposed to? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Zeppelin 0.6.0 on CDH 5.7.1

2016-07-12 Thread Benjamin Kim
ark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64) > > > _ > From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>> > Sent: Saturday, July 9, 2016 10:54 PM > Subject: Re: [ANNOUNCE] Apache Zeppelin 0.6.0 released > To: <us...@ze

Re: Performance Question

2016-07-11 Thread Benjamin Kim
Todd, It’s no problem to start over again. But, a tool like that would be helpful. Gaps in data can be accommodated for by just back filling. Thanks, Ben > On Jul 11, 2016, at 10:47 AM, Todd Lipcon <t...@cloudera.com> wrote: > > On Mon, Jul 11, 2016 at 10:40 AM, Benj

Re: Performance Question

2016-07-11 Thread Benjamin Kim
Todd > > On Mon, Jul 11, 2016 at 10:35 AM, Benjamin Kim <b...@amobee.com > <mailto:b...@amobee.com>> wrote: > Over the weekend, a tablet server went down. It’s not coming back up. So, I > decommissioned it and removed it from the cluster. Then, I restarted Kudu > be

Re: [ANNOUNCE] Apache Zeppelin 0.6.0 released

2016-07-09 Thread Benjamin Kim
Regards, > JL > > On Sat, Jul 9, 2016 at 11:47 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Feix, > > I added hive-site.xml to the conf directory and restarted Zeppelin. Now, I > get another error: > > java.lang.ClassNotFo

Re: [ANNOUNCE] Apache Zeppelin 0.6.0 released

2016-07-09 Thread Benjamin Kim
ite.xml) - Spark's log should indicate that. > > > _________ > From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>> > Sent: Friday, July 8, 2016 6:53 AM > Subject: Re: [ANNOUNCE] Apache Zeppelin 0.6.0 released > To: <users@zeppel

Re: Performance Question

2016-07-08 Thread Benjamin Kim
in production”, as management tends to say. Cheers, Ben > On Jul 6, 2016, at 9:46 AM, Dan Burkert <d...@cloudera.com> wrote: > > > > On Wed, Jul 6, 2016 at 7:05 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Over the weekend, th

Re: [ANNOUNCE] Apache Zeppelin 0.6.0 released

2016-07-08 Thread Benjamin Kim
8, 2016, at 2:01 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > > Is this possibly caused by CDH requiring a build-from-source instead of the > official binary releases? > > > > > > On Thu, Jul 7, 2016 at 8:22 PM -0700, "Benjamin Kim"

  1   2   3   >