Hi Stephen,
I forgot to mention that I added these lines below to the spark-default.conf on
the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I
restarted it.
spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
rk.
>
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuil...@gmail.com
> <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>>:
>
>> Hi Stephen,
>>
>> I forgot to mention that I added these lines below to the
>> spark-default.conf on the node with Spark
Spark Standalone per the spark.worker.cleanup.appDataTtl config param.
>
> The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose,
> but I couldn't get this to work for whatever reason, so i'm sticking to the
> --jars approach used in my examples.
>
SPATH for this purpose,
> but I couldn't get this to work for whatever reason, so i'm sticking to the
> --jars approach used in my examples.
>
> On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Stephen,
>
>
xianrbJd6zP6AcPCCdOABUrV8Pw>
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 3 June 2016 at 14:13, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Does anyone know how to save data in a DataFrame to
Pw>
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 3 June 2016 at 17:04, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> The table already exists.
>
> CREATE EXTERNAL TABLE `amo
g.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed: No
> Num Buckets:
Does anyone know how to save data in a DataFrame to a table partitioned using
an existing column reformatted into a derived column?
val partitionedDf = df.withColumn("dt",
concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp", 12,
2), lit(":00")))
Has anyone run into this requirement?
We have a need to track data integrity and model quality metrics of outcomes so
that we can both gauge if the data is healthy coming in and the models run
against them are still performing and not giving faulty results. A nice to have
would be to graph
Has anyone implemented a way to track the performance of a data model? We
currently have an algorithm to do record linkage and spit out statistics of
matches, non-matches, and/or partial matches with reason codes of why we didn’t
match accurately. In this way, we will know if something goes
I got the same problem when I added the Phoenix plugin jar in the driver and
executor extra classpaths. Do you have those set too?
> On Feb 9, 2016, at 1:12 PM, Koert Kuipers wrote:
>
> yes its not using derby i think: i can see the tables in my actual hive
> metastore.
>
Hi DaeJin,
The closest thing I can think of is this.
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html
Cheers,
Ben
> On Feb 3, 2016, at 9:49 PM, DaeJin Jung wrote:
>
> hello everyone,
> I have a short question.
>
> I would
Hi David,
My company uses Lamba to do simple data moving and processing using python
scripts. I can see using Spark instead for the data processing would make it
into a real production level platform. Does this pave the way into replacing
the need of a pre-instantiated cluster in AWS or bought
You can download the Spark ODBC Driver.
https://databricks.com/spark/odbc-driver-download
> On Feb 3, 2016, at 10:09 AM, Jörn Franke wrote:
>
> This could be done through odbc. Keep in mind that you can run SaS jobs
> directly on a Hadoop cluster using the SaS embedded
Ted,
Any idea as to when this will be released?
Thanks,
Ben
> On Feb 17, 2016, at 2:53 PM, Ted Yu wrote:
>
> The HBASE JIRA below is for HBase 2.0
>
> HBase Spark module would be back ported to hbase 1.3.0
>
> FYI
>
> On Feb 17, 2016, at 1:13 PM, Chandeep Singh
il.com> wrote:
>
> Could you wrap the ZipInputStream in a List, since a subtype of
> TraversableOnce[?] is required?
>
> case (name, content) => List(new ZipInputStream(content.open))
>
> Xinh
>
> On Wed, Mar 9, 2016 at 7:07 AM, Benjamin Kim <bbuil...@gmail.com
1.0 root dir and add the following to root pom.xml:
> hbase-spark
>
> Then you would be able to build the module yourself.
>
> hbase-spark module uses APIs which are compatible with hbase 1.0
>
> Cheers
>
> On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.
Hi Ted,
I see that you’re working on the hbase-spark module for hbase. I recently
packaged the SparkOnHBase project and gave it a test run. It works like a charm
on CDH 5.4 and 5.5. All I had to do was add
/opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the
classpath.txt
is not in branch-1.
>
> compressionByName() resides in class with @InterfaceAudience.Private which
> got moved in master branch.
>
> So looks like there is some work to be done for backporting to branch-1 :-)
>
> On Sun, Mar 13, 2016 at 1:35 PM, Benjamin Kim <bbuil...@gmail
th hbase 1.0
>
> Cheers
>
> On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Ted,
>
> I see that you’re working on the hbase-spark module for hbase. I recently
> packaged the SparkOnHBase project and
I am wondering if anyone can help.
Our company stores zipped CSV files in S3, which has been a big headache from
the start. I was wondering if anyone has created a way to iterate through
several subdirectories (s3n://events/2016/03/01/00, s3n://2016/03/01/01, etc.)
in S3 to find the newest
ple files in each zip? Single file archives are processed just
> like text as long as it is one of the supported compression formats.
>
> Regards
> Sab
>
> On Wed, Mar 9, 2016 at 10:33 AM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
>
gt;
> HTH
>
> Dr Mich Talebzadeh
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> http://talebzadehmich.wordpress.com &l
y-MM-dd'))
> AS TransactionDate
> , TransactionType
> , Description
> , Value
> , Balance
> , AccountName
> , AccountNumber
> FROM tmp
> """
> sql(sqltext)
>
> println ("\nFinished at");
.load("s3://" + bucket + "/" + key)
//save to hbase
})
ssc.checkpoint(checkpointDirectory) // set checkpoint directory
ssc
}
Thanks,
Ben
> On Apr 9, 2016, at 6:12 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
> Ah, I spoke too soon.
>
>
t; Sent from my iPhone
>
> On Apr 9, 2016, at 9:55 AM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
>
>> Nezih,
>>
>> This looks like a good alternative to having the Spark Streaming job check
>> for new files
w S3 files created in your every batch interval.
>
> Thanks,
> Natu
>
> On Fri, Apr 8, 2016 at 9:14 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Has anyone monitored an S3 bucket or directory us
ext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey",
> AWSSecretAccessKey)
>
> val inputS3Stream = ssc.textFileStream("s3://example_bucket/folder")
>
> This code will probe for new S3 files created in your every batch interval.
>
> Thanks,
&g
, please let me know.
Thanks,
Ben
> On Apr 9, 2016, at 2:49 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
> This was easy!
>
> I just created a notification on a source S3 bucket to kick off a Lambda
> function that would decompress the dropped file and save it to another
to be the endpoint of this
notification. This would then convey to a listening Spark Streaming job the
file information to download. I like this!
Cheers,
Ben
> On Apr 9, 2016, at 9:54 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
> This is awesome! I have someplace to start from.
>
Has anyone monitored an S3 bucket or directory using Spark Streaming and pulled
any new files to process? If so, can you provide basic Scala coding help on
this?
Thanks,
Ben
-
To unsubscribe, e-mail:
I need a little help. I am loading into Spark 1.6 zipped csv files stored in s3.
First of all, I am able to get the List of file keys that have a modified date
within a range of time by using the AWS SDK Objects (AmazonS3Client,
ObjectListing, S3ObjectSummary, ListObjectsRequest,
I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV file?
I am able to download the file first locally using the SFTP Client in the
spark-sftp package. Then, I load the file into a dataframe using the spark-csv
package, which automatically decompresses the file. I just
I want to ask about something related to this.
Does anyone know if there is or will be a command line equivalent of
spark-shell client for Livy Spark Server or any other Spark Job Server? The
reason that I am asking spark-shell does not handle multiple users on the same
server well. Since a
<sw...@snappydata.io> wrote:
>
> (-user)
>
> On Thursday 03 March 2016 10:09 PM, Benjamin Kim wrote:
>> I forgot to mention that we will be scheduling this job using Oozie. So, we
>> will not be able to know which worker node is going to being running this.
>> If we try
To comment…
At my company, we have not gotten it to work in any other mode than local. If
we try any of the yarn modes, it fails with a “file does not exist” error when
trying to locate the executable jar. I mentioned this to the Hue users group,
which we used for this, and they replied that
Does anyone know if this is possible? I have an RDD loaded with rows of CSV
data strings. Each string representing the header row and multiple rows of data
along with delimiters. I would like to feed each thru a CSV parser to convert
the data into a dataframe and, ultimately, UPSERT a
I have a quick question. I have downloaded multiple zipped files from S3 and
unzipped each one of them into strings. The next step is to parse using a CSV
parser. I want to know if there is a way to easily use the spark csv package
for this?
Thanks,
Ben
Hi Gil,
Currently, our company uses S3 heavily for data storage. Can you further
explain the benefits of this in relation to S3 when the pending patch does come
out? Also, I have heard of Swift from others. Can you explain to me the pros
and cons of Swift compared to HDFS? It can be just a
gt; I tried saving DF to HBase using a hive table with hbase storage handler and
> hiveContext but it failed due to a bug.
>
> I was able to persist the DF to hbase using Apache Pheonix which was pretty
> simple.
>
> Thank you.
> Daniel
>
> On 21 Apr 2016, at 16:52, B
Can someone explain to me how the new Structured Streaming works in the
upcoming Spark 2.0+? I’m a little hazy how data will be stored and referenced
if it can be queried and/or batch processed directly from streams and if the
data will be append only to or will there be some sort of upsert
Next Thursday is Databricks' webinar on Spark 2.0. If you are attending, I bet
many are going to ask when the release will be. Last time they did this, Spark
1.6 came out not too long afterward.
> On Apr 28, 2016, at 5:21 AM, Sean Owen wrote:
>
> I don't know if anyone has
ty-group.com <mailto:daniel.ha...@veracity-group.com>>
> wrote:
> Hi Benjamin,
> Yes it should work.
>
> Let me know if you need further assistance I might be able to get the code
> I've used for that project.
>
> Thank you.
> Daniel
>
> On 24 Apr 2016, at 17:
?
Thanks,
Ben
> On Apr 21, 2016, at 6:56 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> The hbase-spark module in Apache HBase (coming with hbase 2.0 release) can do
> this.
>
> On Thu, Apr 21, 2016 at 6:52 AM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmai
I have data in a DataFrame loaded from a CSV file. I need to load this data
into HBase using an RDD formatted in a certain way.
val rdd = sc.parallelize(
Array(key1,
(ColumnFamily, ColumnName1, Value1),
(ColumnFamily, ColumnName2, Value2),
I am trying to stream files from an S3 bucket using CDH 5.7.0’s version of
Spark 1.6.0. It seems not to work. I keep getting this error.
Exception in thread "JobGenerator" java.lang.VerifyError: Bad type on operand
stack
Exception Details:
Location:
could be wrong.
Thanks,
Ben
> On May 21, 2016, at 4:18 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> Maybe more than one version of jets3t-xx.jar was on the classpath.
>
> FYI
>
> On Fri, May 20, 2016 at 8:31 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil
Ben
> On May 21, 2016, at 4:18 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> Maybe more than one version of jets3t-xx.jar was on the classpath.
>
> FYI
>
> On Fri, May 20, 2016 at 8:31 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>>
in hbase-spark module.
>
> Cheers
>
> On Apr 27, 2016, at 10:31 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
>
>> Hi Ted,
>>
>> Do you know when the release will be? I also see some documentation for
>> usage of the hb
I have a curiosity question. These forever/unlimited DataFrames/DataSets will
persist and be query capable. I still am foggy about how this data will be
stored. As far as I know, memory is finite. Will the data be spilled to disk
and be retrievable if the query spans data not in memory? Is
Hi Ofir,
I just recently saw the webinar with Reynold Xin. He mentioned the Spark
Session unification efforts, but I don’t remember the DataSet for Structured
Streaming aka Continuous Applications as he put it. He did mention streaming or
unlimited DataFrames for Structured Streaming so one
m
>
>
> Mobile: +972-54-7801286 <tel:%2B972-54-7801286> | Email:
> ofir.ma...@equalum.io <mailto:ofir.ma...@equalum.io>
> On Sun, May 15, 2016 at 11:58 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Ofir,
>
lease check csvRdd api here,
> https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvParser.scala#L150
>
> <https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/CsvParser.scala#L150>.
>
> Thanks!
>
Karau <hol...@pigscanfly.ca> wrote:
>
> You could certainly use RDDs for that, you might also find using Dataset
> selecting the fields you need to construct the URL to fetch and then using
> the map function to be easier.
>
> On Thu, Apr 14, 2016 at 12:01 PM, Be
I was wonder what would be the best way to use JSON in Spark/Scala. I need to
lookup values of fields in a collection of records to form a URL and download
that file at that location. I was thinking an RDD would be perfect for this. I
just want to hear from others who might have more experience
gt; Would you try this codes below?
>
> val csvRDD = ...your processimg for csv rdd..
> val df = new CsvParser().csvRdd(sqlContext, csvRDD, useHeader = true)
>
> Thanks!
>
> On 16 Apr 2016 1:35 a.m., "Benjamin Kim" <bbuil...@gmail.com
> <mailto:bbuil...
t; Hi,
>
> Would you try this codes below?
>
> val csvRDD = ...your processimg for csv rdd..
> val df = new CsvParser().csvRdd(sqlContext, csvRDD, useHeader = true)
>
> Thanks!
>
> On 16 Apr 2016 1:35 a.m., "Benjamin Kim" <bbuil...@gmail.com
> <ma
>> You could certainly use RDDs for that, you might also find using Dataset
>> selecting the fields you need to construct the URL to fetch and then using
>> the map function to be easier.
>>
>> On Thu, Apr 14, 2016 at 12:01 PM, Benjamin Kim <bbuil...@gmail
I see that the new CDH 5.7 has been release with the HBase Spark module
built-in. I was wondering if I could just download it and use the hbase-spark
jar file for CDH 5.5. Has anyone tried this yet?
Thanks,
Ben
-
To
Has anyone found an easy way to save a DataFrame into HBase?
Thanks,
Ben
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
release) can do
> this.
>
> On Thu, Apr 21, 2016 at 6:52 AM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Has anyone found an easy way to save a DataFrame into HBase?
>
> Thanks,
> Ben
>
>
>
It is included in Cloudera’s CDH 5.8.
> On Jul 22, 2016, at 6:13 PM, Mail.com wrote:
>
> Hbase Spark module will be available with Hbase 2.0. Is that out yet?
>
>> On Jul 22, 2016, at 8:50 PM, Def_Os wrote:
>>
>> So it appears it should be possible
It takes me to the directories instead of the webpage.
> On Jul 13, 2016, at 11:45 AM, manish ranjan <cse1.man...@gmail.com> wrote:
>
> working for me. What do you mean 'as supposed to'?
>
> ~Manish
>
>
>
> On Wed, Jul 13, 2016 at 11:45 AM, Benjamin Kim <
Has anyone noticed that the spark.apache.org is not working as supposed to?
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
From what I read, there is no more Contexts.
"SparkContext, SQLContext, HiveContext merged into SparkSession"
I have not tested it, but I don’t know if it’s true.
Cheers,
Ben
> On Jul 18, 2016, at 8:37 AM, Koert Kuipers wrote:
>
> in my codebase i would like to
I recently got a sales email from SnappyData, and after reading the
documentation about what they offer, it sounds very similar to what Structured
Streaming will offer w/o the underlying in-memory, spill-to-disk, CRUD
compliant data storage in SnappyData. I was wondering if Structured Streaming
"options(key 'hashtag', frequencyCol 'retweets', timeSeriesColumn
> 'tweetTime' )"
> where 'tweetStreamTable' is created using the 'create stream table ...' SQL
> syntax.
>
>
> -
> Jags
> SnappyData blog <http://www.snappydata.io/blog>
> Download binary, s
re).
>
>
> -
> Jags
> SnappyData blog <http://www.snappydata.io/blog>
> Download binary, source <https://github.com/SnappyDataInc/snappydata>
>
>
> On Wed, Jul 6, 2016 at 12:49 AM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>&g
I would like to know if anyone has tried using the hbase-spark module? I tried
to follow the examples in conjunction with CDH 5.8.0. I cannot find the
HBaseTableCatalog class in the module or in any of the Spark jars. Can someone
help?
Thanks,
Ben
I was wondering if anyone, who is a Spark Scala developer, would be willing to
continue the work done for the Kudu connector?
https://github.com/apache/incubator-kudu/tree/master/java/kudu-spark/src/main/scala/org/kududb/spark/kudu
I have been testing and using Kudu for the past month and
t;
>
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Asher,
>
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
> (1.8) version as our installation. The Scala (2.10.5) vers
her Krim <ak...@hubspot.com> wrote:
>
> Ben,
>
> That looks like a scala version mismatch. Have you checked your dep tree?
>
> Asher Krim
> Senior Software Engineer
>
>
> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:
o, if you're seeing this locally, you might want to
> check which version of the scala sdk your IDE is using
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>
> Hi Asher,
>
> I modified th
Dverbose=true"? And did you see only scala 2.10.5 being pulled in?
>
> On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Asher,
>
> It’s still the same. Do you have any other ideas?
>
> Cheers,
> Ben
ly, you might want to
> check which version of the scala sdk your IDE is using
>
> Asher Krim
> Senior Software Engineer
>
>
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Asher,
>
> I modifie
Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I tried
to build it from source, but I cannot get it to work.
Thanks,
Ben
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
ltSource.createRelation(HBaseRelation.scala:51)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
If you can please help, I would be grateful.
Cheers,
Ben
>
Elek,
If I cannot use the HBase Spark module, then I’ll give it a try.
Thanks,
Ben
> On Jan 31, 2017, at 1:02 PM, Marton, Elek <h...@anzix.net> wrote:
>
>
> I tested this one with hbase 1.2.4:
>
> https://github.com/hortonworks-spark/shc
>
> Marton
>
>
.0. We are
> waiting for the move to Spark 2.0/2.1.
>
> And besides that would you not want to work on a platform which is at least
> 10 times faster What would that be?
>
> Regards,
> Gourav Sengupta
>
> On Thu, Feb 23, 2017 at 6:23 PM, Benjamin Kim <bbuil...@gmail.com
e you do not want to be writing code which needs you to update it once
> again in 6 months because newer versions of SPARK now find it deprecated.
>
>
> Regards,
> Gourav Sengupta
>
>
>
> On Fri, Feb 24, 2017 at 7:18 AM, Benjamin Kim <bbuil...@gmail.com
>
We are trying to use Spark 1.6 within CDH 5.7.1 to retrieve a 1.3GB Parquet
file from AWS S3. We can read the schema and show some data when the file is
loaded into a DataFrame, but when we try to do some operations, such as count,
we get this error below.
code and see if the issue resolves, then it can be
> hidden and read from Input Params.
>
> Thanks,
> Aakash.
>
>
> On 23-Feb-2017 11:54 PM, "Benjamin Kim" <bbuil...@gmail.com
> <mailto:bbuil...@gmail.com>> wrote:
> We are trying to use Spark 1.
Has anyone got some advice on how to remove the reliance on HDFS for storing
persistent data. We have an on-premise Spark cluster. It seems like a waste of
resources to keep adding nodes because of a lack of storage space only. I would
rather add more powerful nodes due to the lack of
gt; wrote:
>
> Your vendor should use the parquet internal compression and not take a
> parquet file and gzip it.
>
>> On 13 Feb 2017, at 18:48, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>> We are receiving files from an outside vendor who creates a Parqu
We are receiving files from an outside vendor who creates a Parquet data file
and Gzips it before delivery. Does anyone know how to Gunzip the file in Spark
and inject the Parquet data into a DataFrame? I thought using sc.textFile or
sc.wholeTextFiles would automatically Gunzip the file, but
I was wondering if anyone has tried to create Spark SQL tables on top of HBase
tables so that data in HBase can be accessed using Spark Thriftserver with SQL
statements? This is similar what can be done using Hive.
Thanks,
Ben
2 September 2016 at 23:08, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com
> <mailto:mdkhajaasm...@gmail.com>> wrote:
> Hi Kim,
>
> I am also looking for same information. Just got the same requirement today.
>
> Thanks,
> Asmath
>
> On Fri, Sep 2, 2016
of data or any other property which may arise from
> relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
> On 3 September 2016 at 20:31, Benjamin
Disclaimer: Use it at your own risk. Any and all responsibility for any loss,
> damage or destruction of data or any other property which may arise from
> relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary dama
Does anyone have any thoughts about using Spark SQL Thriftserver in Spark 1.6.2
instead of HiveServer2? We are considering abandoning HiveServer2 for it. Some
advice and gotcha’s would be nice to know.
Thanks,
Ben
-
To
Has anyone created tables using Spark SQL that directly connect to a JDBC data
source such as PostgreSQL? I would like to use Spark SQL Thriftserver to access
and query remote PostgreSQL tables. In this way, we can centralize data access
to Spark SQL tables along with PostgreSQL making it very
xposing data ie create hive
> tables which "point to" any other DB. i know Oracle provides there own Serde
> for hive. Not sure about PG though.
>
> Once tables are created in hive, STS will automatically see it.
>
> On Wed, Sep 14, 2016 at 11:08 AM, Benjam
I have a table with data already in it that has primary keys generated by the
function monotonicallyIncreasingId. Now, I want to insert more data into it
with primary keys that will auto-increment from where the existing data left
off. How would I do this? There is no argument I can pass into
.
Thanks,
Ben
> On Sep 16, 2016, at 3:29 PM, Nikolay Zhebet <phpap...@gmail.com> wrote:
>
> Hi! Can you split init code with current comand? I thing it is main problem
> in your code.
>
> 16 сент. 2016 г. 8:26 PM пользователь "Benjamin Kim" <bbuil...@gm
Has anyone using Spark 1.6.2 encountered very slow responses from pulling data
from PostgreSQL using JDBC? I can get to the table and see the schema, but when
I do a show, it takes very long or keeps timing out.
The code is simple.
val jdbcDF = sqlContext.read.format("jdbc").options(
We use Graphite/Grafana for custom metrics. We found Spark’s metrics not to be
customizable. So, we write directly using Graphite’s API, which was very easy
to do using Java’s socket library in Scala. It works great for us, and we are
going one step further using Sensu to alert us if there is
I am trying to implement checkpointing in my streaming application but I am
getting a not serializable error. Has anyone encountered this? I am deploying
this job in YARN clustered mode.
Here is a snippet of the main parts of the code.
object S3EventIngestion {
//create and setup streaming
Mich,
I know up until CDH 5.4 we had to add the HTrace jar to the classpath to make
it work using the command below. But after upgrading to CDH 5.7, it became
unnecessary.
echo "/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar" >>
/etc/spark/conf/classpath.txt
Hope this helps.
gt; wrote:
>
> Is there only one process adding rows? because this seems a little risky if
> you have multiple threads doing that…
>
>> On Oct 8, 2016, at 1:43 PM, Benjamin Kim <bbuil...@gmail.com
>> <mailto:bbuil...@gmail.com>> wrote:
>>
>> Mich,
Has anyone worked with AWS Kinesis and retrieved data from it using Spark
Streaming? I am having issues where it’s returning no data. I can connect to
the Kinesis stream and describe using Spark. Is there something I’m missing?
Are there specific IAM security settings needed? I just simply
1 - 100 of 117 matches
Mail list logo