I'm using Hadoop 1.0.4 and Spark 1.2.0.
I'm facing a strange issue. I have a requirement to read a small file from
HDFS and all it's content has to be read at one shot. So I'm using spark
context's wholeTextFiles API passing the HDFS URL for the file.
When I try this from a spark shell it's
Hi All,
I have a hive table where data from 2 different sources (S1 and S2) get
accumulated. Sample data below -
*RECORD_ID|SOURCE_TYPE|TRN_NO|DATE1|DATE2|BRANCH|REF1|REF2|REF3|REF4|REF5|REF6|DC_FLAG|AMOUNT|CURRENCY*
out of all columns you have
> used for joining. So, records 1 and 4 should generate same hash value.
> 3. group by using this new id (you have already linked the records) and
> pull out required fields.
>
> Please let the group know if it works...
>
> Best
> Ayan
>
>
Dear All,
I'm using -
= Spark 1.2.0
= Hive 0.13.1
= Mesos 0.18.1
= Spring
= JDK 1.7
I've written a scala program which
= instantiates a spark and hive context
= parses an XML file which provides the where clauses for queries
= generates full fledged hive queries to be run on hive
laptop having 4 CPUs and 12GB RAM.
On Wed, Jul 29, 2015 at 2:49 PM, fightf...@163.com fightf...@163.com
wrote:
Hi, Sarath
Did you try to use and increase spark.excecutor.extraJaveOptions
-XX:PermSize= -XX:MaxPermSize=
--
fightf...@163.com
*From:* Sarath Chandra
with this option to
rule out a config problem.
On Wed, Jul 29, 2015 at 10:45 AM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Yes.
As mentioned in my mail at the end, I tried with both 256 and 512
options.
But the issue persists.
I'm giving following parameters to spark
,
*Sarath Chandra Josyam*
Sr. Technical Architect
*Algofusion Technologies India Pvt. Ltd.*
Email: sarathchandra.jos...@algofusiontech.com
Phone: +91-80-65330112/113
Mobile: +91 8762491331
On Wed, Mar 4, 2015 at 5:08 PM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Hi,
I have
Hi,
I have a cluster running on CDH5.2.1 and I have a Mesos cluster (version
0.18.1). Through a Oozie java action I'm want to submit a Spark job to
mesos cluster. Before configuring it as Oozie job I'm testing the java
action from command line and getting exception as below. While running I'm
Hi All,
I have a requirement to process a set of files in parallel. So I'm
submitting spark jobs using java's ExecutorService. But when I do this way,
1 or more jobs are failing with status as EXITED. Earlier I tried with a
standalone spark cluster setting the job scheduling to Fair Scheduling. I
Hi All,
I have a java program which submits a spark job to a standalone spark
cluster (2 nodes; 10 cores (6+4); 12GB (8+4)). This is being called by
another java program through ExecutorService and invokes it multiple times
with different set of arguments and parameters. I have set spark memory
Hi All,
If my RDD is having array/sequence of strings, how can I save them as a
HDFS file with each string on separate line?
For example if I write code as below, the output should get saved as hdfs
file having one string per line
...
...
var newLines = lines.map(line = myfunc(line));
should show your
code since it may not be doing what you think.
If you instantiate an object, it happens every time your function is
called. map() is called once per data element; mapPartitions() once
per partition. It depends.
On Wed, Sep 10, 2014 at 3:25 PM, Sarath Chandra
sarathchandra.jos
it to workers. In the second, you're creating
SomeUnserializableManagerClass in the function and therefore on the
worker.
mapPartitions is better if this creation is expensive.
On Fri, Sep 5, 2014 at 3:06 PM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Hi,
I'm trying
Hi,
I'm trying to migrate a map-reduce program to work with spark. I migrated
the program from Java to Scala. The map-reduce program basically loads a
HDFS file and for each line in the file it applies several transformation
functions available in various external libraries.
When I execute this
:
You can bring those classes out of the library and Serialize it
(implements Serializable). It is not the right way of doing it though it
solved few of my similar problems.
Thanks
Best Regards
On Fri, Sep 5, 2014 at 7:36 PM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote
, Jul 17, 2014 at 1:13 PM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
No Sonal, I'm not doing any explicit call to stop context.
If you see my previous post to Michael, the commented portion of the code
is my requirement. When I run this over standalone spark cluster
Hi All,
I'm trying to do a simple record matching between 2 files and wrote
following code -
*import org.apache.spark.sql.SQLContext;*
*import org.apache.spark.rdd.RDD*
*object SqlTest {*
* case class Test(fld1:String, fld2:String, fld3:String, fld4:String,
fld4:String, fld5:Double,
are info messages.
What else do I need check?
~Sarath
On Wed, Jul 16, 2014 at 7:23 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:
Check your executor logs for the output or if your data is not big collect
it in the driver and print it.
On Jul 16, 2014, at 9:21 AM, Sarath Chandra
On Wed, Jul 16, 2014 at 7:48 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:
When you submit your job, it should appear on the Spark UI. Same with the
REPL. Make sure you job is submitted to the cluster properly.
On Wed, Jul 16, 2014 at 10:08 AM, Sarath Chandra
sarathchandra.jos
at 7:59 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:
Can you try submitting a very simple job to the cluster.
On Jul 16, 2014, at 10:25 AM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Yes it is appearing on the Spark UI, and remains there with state as
RUNNING till
On Wed, Jul 16, 2014 at 8:14 PM, Michael Armbrust mich...@databricks.com
wrote:
What if you just run something like:
*sc.textFile(hdfs://localhost:54310/user/hduser/file1.csv).count()*
On Wed, Jul 16, 2014 at 10:37 AM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Yes
22 matches
Mail list logo