Please take a look at test_column_operators in python/pyspark/sql/tests.py
FYI
On Sat, Nov 14, 2015 at 11:49 PM, YaoPau wrote:
> I'm using pyspark 1.3.0, and struggling with what should be simple.
> Basically, I'd like to run this:
>
> site_logs.filter(lambda r: 'page_row'
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC
Cheers
On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote:
> I have tried with G1 GC .Please if anyone can provide their setting for GC.
> At code level I am :
> 1.reading orc table usind dataframe
> 2.map
Not sure on that, maybe someone else can chime in
On Sat, Nov 14, 2015 at 4:51 AM, kundan kumar wrote:
> Hi Cody ,
>
> Thanks for the clarification. I will try to come up with some workaround.
>
> I have an another doubt. When my job is restarted, and recovers from the
>
Hi,
Yarn UI on 18080 stops receiving updates Spark jobs/tasks immediately after
it starts. We see only one task completed in the UI while the other hasn't
got any resources while in reality, more than 5 tasks would have completed.
Hadoop - Amazon 2.6
Spark - 1.5
Thanks and Regards,
Suraj Sheth
Hi,
I am working on spark 1.4 and reading a orc table using dataframe and
converting that DF to RDD
I spark UI I observe that 50 % task are running on locality and ANY and
very few on LOCAL.
What would be the possible reason for this?
Please help. I have even changed locality settings
Thanks
Kudo goes to Josh.
Cheers
> On Nov 14, 2015, at 10:04 PM, Jerry Lam wrote:
>
> Hi Ted,
>
> That looks exactly what happens. It has been 5 hrs now. The code was built
> for 1.4. Thank you very much!
>
> Best Regards,
>
> Jerry
>
> Sent from my iPhone
>
>> On 14
Not direct answer to your question.
But It might be useful for you to check Spring XD Spark integration.
https://github.com/spring-projects/spring-xd-samples/tree/master/spark-streaming-wordcount-java-processor
On Mon, Nov 16, 2015 at 6:14 AM, Muthu Jayakumar wrote:
> I
Hi
I wanted to know how to go about registering scala functions as UDFs using
spark sql
create temporary function statement.
Currently I do the following
/* convert prices to holding period returns */
object VaR extends Serializable {
def returns(prices :Seq[Double], horizon: Integer) :
I'm having problem connecting my spark app to a Mesos cluster; any help on
the below question would be appreciated.
http://stackoverflow.com/questions/33727154/spark-shell-connecting-to-mesos-stuck-at-sched-cpp
Thanks,
Jong Wook
Hi,
What is the best practice for reading from DynamoDB from Spark? I know I
can use the Java API, but this doesn't seem to take data locality into
consideration at all.
I was looking for something along the lines of the cassandra connector:
https://github.com/datastax/spark-cassandra-connector
Hi,
It could be that the timestamp of the file is old. Moving the file does not
update the file's timestamp. After you have launched the job, either
'touch' the file if it's already in /opt/test/ to update the timestamp or
'cp' the file to a temporary directory then 'mv' it to /opt/test/.
HTH,
Hi, everyone! I deployed spark in the yarn model cluster. I export the
SPARK_MASTER_IP with an ip, and make
sure that all the spark configuration files use ip value in
SPARH_HOME/conf/*, and all the hadoop configuration files use ip value in
HADOOP_HOME/etc/*。I can success to submit spark job by
what are the parameters on which locality depends
On Sun, Nov 15, 2015 at 5:54 PM, Renu Yadav wrote:
> Hi,
>
> I am working on spark 1.4 and reading a orc table using dataframe and
> converting that DF to RDD
>
> I spark UI I observe that 50 % task are running on locality and
Thanks Fengdong,
the startTime, and endTime are null in the method of call(Iterator
lines). Java do not allow top-level class to be Static.
>From Spark docs, I can broadcast them but I don't know how to receive them
form another class.
On 16 November 2015 at 16:16, Fengdong Yu
I want to pass two parameters into new java class from rdd.mapPartitions(),
the code like following.
---Source Code
Main method:
/*the parameters that I want to pass into the PixelGenerator.class for
selecting any items between the startTime and the endTime.
*/
int startTime, endTime;
Can you try : new PixelGenerator(startTime, endTime) ?
> On Nov 16, 2015, at 12:47 PM, Zhang, Jingyu wrote:
>
> I want to pass two parameters into new java class from rdd.mapPartitions(),
> the code like following.
> ---Source Code
>
> Main method:
>
> /*the
Just make PixelGenerator as a nested static class?
> On Nov 16, 2015, at 1:22 PM, Zhang, Jingyu wrote:
>
> Fengdong
Thanks, that worked for local environment but not in the Spark Cluster.
On 16 November 2015 at 16:05, Fengdong Yu wrote:
> Can you try : new PixelGenerator(startTime, endTime) ?
>
>
>
> On Nov 16, 2015, at 12:47 PM, Zhang, Jingyu
> wrote:
>
If you got “cannot Serialized” Exception, then you need to PixelGenerator as
a Static class.
> On Nov 16, 2015, at 1:10 PM, Zhang, Jingyu wrote:
>
> Thanks, that worked for local environment but not in the Spark Cluster.
>
>
> On 16 November 2015 at 16:05,
I would like to know if Hive on Spark uses or shares the execution code
with Spark SQL or DataFrames?
More specifically, does Hive on Spark benefit from the changes made to
Spark SQL, project Tungsten? Or is it completely different execution path
where it creates its own plan and executes on RDD?
It's a completely different path.
On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote:
> I would like to know if Hive on Spark uses or shares the execution code
> with Spark SQL or DataFrames?
>
> More specifically, does Hive on Spark benefit from the changes made to
>
Hi,
While I am trying to read a json file using SQLContext, i get the
following error:
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.sql.SQLContext.(Lorg/apache/spark/api/java/JavaSparkContext;)V
at com.honeywell.test.testhive.HiveSpark.main(HiveSpark.java:15)
The code looks good. can you check your ‘import’ in your code? because it
calls ‘honeywell.test’?
> On Nov 16, 2015, at 3:02 PM, Yogesh Vyas wrote:
>
> Hi,
>
> While I am trying to read a json file using SQLContext, i get the
> following error:
>
> Exception in
Ignore my inputs, I think HiveSpark.java is your main method located.
can you paste the whole pom.xml and your code?
> On Nov 16, 2015, at 3:39 PM, Fengdong Yu wrote:
>
> The code looks good. can you check your ‘import’ in your code? because it
> calls
And, also make sure your scala version is 2.11 for your build.
> On Nov 16, 2015, at 3:43 PM, Fengdong Yu wrote:
>
> Ignore my inputs, I think HiveSpark.java is your main method located.
>
> can you paste the whole pom.xml and your code?
>
>
>
>
>> On Nov 16,
So does not benefit from Project Tungsten right?
On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin wrote:
> It's a completely different path.
>
>
> On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote:
>
>> I would like to know if Hive on Spark uses or
No it does not -- although it'd benefit from some of the work to make
shuffle more robust.
On Sun, Nov 15, 2015 at 10:45 PM, kiran lonikar wrote:
> So does not benefit from Project Tungsten right?
>
>
> On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin
what’s your SQL?
> On Nov 16, 2015, at 3:02 PM, Yogesh Vyas wrote:
>
> Hi,
>
> While I am trying to read a json file using SQLContext, i get the
> following error:
>
> Exception in thread "main" java.lang.NoSuchMethodError:
>
I am trying to just read a JSON file in SQLContext and print the
dataframe as follows:
SparkConf conf = new SparkConf().setMaster("local").setAppName("AppName");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new SQLContext(sc);
DataFrame df =
Hi,
I am using spark streaming check-pointing mechanism and reading the data
from Kafka. The window duration for my application is 2 hrs with a sliding
interval of 15 minutes.
So, my batches run at following intervals...
- 09:45
- 10:00
- 10:15
- 10:30
- and so on
When my job is
Sure
Thanks !!
On Sun, Nov 15, 2015 at 9:13 PM, Cody Koeninger wrote:
> Not sure on that, maybe someone else can chime in
>
> On Sat, Nov 14, 2015 at 4:51 AM, kundan kumar
> wrote:
>
>> Hi Cody ,
>>
>> Thanks for the clarification. I will try to come
I have only written Akka code in Scala only. Here is the akka documentation
that would help you to get started...
http://doc.akka.io/docs/akka/2.4.0/intro/getting-started.html
>JavaSparkContext(conf)
The idea is to create a SparkContext and pass it as a props (constructor in
java sense) to an
32 matches
Mail list logo