Help needed optimize spark history server performance

2024-05-03 Thread Vikas Tharyani
Dear Spark Community,

I'm writing to seek your expertise in optimizing the performance of our
Spark History Server (SHS) deployed on Amazon EKS. We're encountering
timeouts (HTTP 504) when loading large event logs exceeding 5 GB.

*Our Setup:*

   - Deployment: SHS on EKS with Nginx ingress (idle connection timeout: 60
   seconds)
   - Instance: Memory-optimized with sufficient RAM and CPU
   - Spark Daemon Memory: 30 GB
   - Spark History Server Options:
   - K8s Namespace has a limit of 128Gb
   - The backend S3 has a lifecycle policy to delete objects that are older
   than *7 days*.

sparkHistoryOpts:
"-Dspark.history.fs.logDirectory=s3a:///eks-infra-use1/
-Dspark.history.retainedApplications=1
-Dspark.history.ui.maxApplications=20
-Dspark.history.store.serializer=PROTOBUF
-Dspark.hadoop.fs.s3a.threads.max=25
-Dspark.hadoop.fs.s3a.connection.maximum=650
-Dspark.hadoop.fs.s3a.readahead.range=512K
-Dspark.history.fs.endEventReparseChunkSize=2m
-Dspark.history.store.maxDiskUsage=30g"

*Problem:*

   - SHS times out when loading large event logs (8 GB or more).

*Request:*

We would greatly appreciate any insights or suggestions you may have to
improve the performance of our SHS and prevent these timeouts. Here are
some areas we're particularly interested in exploring:

   - Are there additional configuration options we should consider for
   handling large event logs?
   - Could Nginx configuration adjustments help with timeouts?
   - Are there best practices for optimizing SHS performance on EKS?

We appreciate any assistance you can provide.

Thank you for your time and support.

Sincerely,
-- 

Vikas Tharyani

Associate Manager, DevOps

Nielsen

www.nielsen.com <https://global.nielsen.com/>


<https://global.nielsen.com/>


Spark History Server in GCP

2022-04-04 Thread Gnana Kumar
Hi There,

I have been able to start the Spark History Server in GKE Kubernetes
Cluster.

And I have created a Service Account in my google project with permissions
as storage admin,storage object admin and owner.

Now when I have tried to submit the job using Spark Submit, using below
options to write all spark job's event logs to Spark history
server's Google storage bucket.

--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=gs://spark-history-server-gnana/ \
--conf spark.eventLog.permissions=777\
--conf spark.history.fs.logDirectory=gs://spark-history-server-gnana/ \
--conf spark.kubernetes.driver.secrets.sparklogs=/etc/secrets \
--conf spark.kubernetes.executor.secrets.sparklogs=/etc/secrets \

I'm getting the below exception.

com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException:
403 Forbidden

*Note: In my Spark Image, I have added the GCS Connector jar
"gcs-connector-hadoop3-latest.jar"*
*to access the GCS bucket.*

Please help me resolve this issue with respect to GCS Connector and Cloud
Storage bucket.

Thanks
Gnana


spark 3.1.1 history server fails to boot with scala/MatchError

2021-05-20 Thread Bulldog20630405
we have spark 2.4.x clusters running fine; however when running spark 3.1.1
the spark history server fails to boot

note: we build spark 3.1.1 from source for hadoop 3.2.1
we use supervisord to start services so our start command is:

$SPARK_HOME/bin/spark-class org.apache.spark.deploy.history.HistoryServer

Error: A JNI error occurred, please check your installation and try again
Expectation in thread "main" java.lang.NoClassDefFoundError:
scala/MatchError
...
caused by: java.lang.ClassNotFoundException: scala.MatchError
...

is there a different HistorySever class for spark 3 using scala 2.12 ?


Spark History Server to S3 doesn't show up incomplete jobs

2021-05-17 Thread Tianbin Jiang
Hi all,
 I am using Spark 2.4.5.  I am redirecting the spark event logs to a S3
with the following configuration:

spark.eventLog.enabled = true
spark.history.ui.port = 18080
spark.eventLog.dir = s3://livy-spark-log/spark-history/
spark.history.fs.logDirectory = s3://livy-spark-log/spark-history/
spark.history.fs.update.interval = 5s


Once my application is completed, I can see it shows up on the spark
history server. However, running applications doesn't show up on
"incomplete applications". I have also checked the log, whenever my
application end, I can see this message:

21/05/17 06:14:18 INFO k8s.KubernetesClusterSchedulerBackend: Shutting down
all executors
21/05/17 06:14:18 INFO
k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each
executor to shut down
21/05/17 06:14:18 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes
client has been closed (this is expected if the application is shutting
down.)
*21/05/17 06:14:18 INFO s3n.MultipartUploadOutputStream: close closed:false
s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6.inprogress*
*21/05/17 06:14:19 INFO s3n.S3NativeFileSystem: rename
s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6.inprogress
s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6*
21/05/17 06:14:19 INFO spark.MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
21/05/17 06:14:19 INFO memory.MemoryStore: MemoryStore cleared
21/05/17 06:14:19 INFO storage.BlockManager: BlockManager stopped


I am not able to see any xx.inprogress file on S3 though. Anyone had this
problem before?

-- 

Sincerely:
 Tianbin Jiang


Re: Spark History Server log files questions

2021-03-23 Thread German Schiavon
Hey!

I don't think you can do selectively removals, never heard of it but who
knows..

You can refer here to see all the available options ->
https://spark.apache.org/docs/latest/monitoring.html .

In my experience having 4 days worth of logs is enough, usually if
something fails you check it right away unless it is the weekend, but
depending on the use case you could store more days..



On Mon, 22 Mar 2021 at 23:52, Hung Vu  wrote:

> Hi,
>
> I have couple questions to ask regarding the Spark history server:
>
> 1. Is there a way for a cluster to selectively clean old files? For
> example, if we want to keep some logs from 3 days ago but also cleaned some
> logs from 2 days ago, is there a filter or config to do that?
> 2. We have over 1000 log files each day. If we want to keep those jobs for
> a week (7000 jobs in total), this would potentially make the load time
> longer. Is there any suggestion on doing this?
> 3. We plan to have 2 paths to long-term history server and short-term
> history server. We can move some log files from short-term to long-term
> server if we need to do some investigation on that, would this be a good
> idea. Do you have any input on this?
>
> Thank you in advance!
>


Spark History Server log files questions

2021-03-22 Thread Hung Vu
Hi,

I have couple questions to ask regarding the Spark history server:

1. Is there a way for a cluster to selectively clean old files? For
example, if we want to keep some logs from 3 days ago but also cleaned some
logs from 2 days ago, is there a filter or config to do that?
2. We have over 1000 log files each day. If we want to keep those jobs for
a week (7000 jobs in total), this would potentially make the load time
longer. Is there any suggestion on doing this?
3. We plan to have 2 paths to long-term history server and short-term
history server. We can move some log files from short-term to long-term
server if we need to do some investigation on that, would this be a good
idea. Do you have any input on this?

Thank you in advance!


Running Spark history Server at Context localhost:18080/sparkhistory

2019-08-19 Thread Sandish Kumar HN
Hi,

I want to run  Running Spark history Server at Context
localhost:18080/sparkhistory instead at port localhost:18080

The end goal is to access Spark History Server with a domain name i.e,
 domainname/sparkhistory

is there any hacks or spark config options?
-- 

Thanks,
Regards,
SandishKumar HN


Re: Spark UI History server on Kubernetes

2019-01-23 Thread Li Gao
In addition to what Rao mentioned, if you are using cloud blob storage such
as AWS S3, you can specify your history location to be an S3 location such
as:  `s3://mybucket/path/to/history`


On Wed, Jan 23, 2019 at 12:55 AM Rao, Abhishek (Nokia - IN/Bangalore) <
abhishek@nokia.com> wrote:

> Hi Lakshman,
>
>
>
> We’ve set these 2 properties to bringup spark history server
>
>
>
> spark.history.fs.logDirectory 
>
> spark.history.ui.port 
>
>
>
> We’re writing the logs to HDFS. In order to write logs, we’re setting
> following properties while submitting the spark job
>
> spark.eventLog.enabled true
>
> spark.eventLog.dir 
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Battini Lakshman 
> *Sent:* Wednesday, January 23, 2019 1:55 PM
> *To:* Rao, Abhishek (Nokia - IN/Bangalore) 
> *Subject:* Re: Spark UI History server on Kubernetes
>
>
>
> HI Abhishek,
>
>
>
> Thank you for your response. Could you please let me know the properties
> you configured for bringing up History Server and its UI.
>
>
>
> Also, are you writing the logs to any directory on persistent storage, if
> yes, could you let me know the changes you did in Spark to write logs to
> that directory. Thanks!
>
>
>
> Best Regards,
>
> Lakshman Battini.
>
>
>
> On Tue, Jan 22, 2019 at 10:53 PM Rao, Abhishek (Nokia - IN/Bangalore) <
> abhishek@nokia.com> wrote:
>
> Hi,
>
>
>
> We’ve setup spark-history service (based on spark 2.4) on K8S. UI works
> perfectly fine when running on NodePort. We’re facing some issues when on
> ingress.
>
> Please let us know what kind of inputs do you need?
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Battini Lakshman 
> *Sent:* Tuesday, January 22, 2019 6:02 PM
> *To:* user@spark.apache.org
> *Subject:* Spark UI History server on Kubernetes
>
>
>
> Hello,
>
>
>
> We are running Spark 2.4 on Kubernetes cluster, able to access the Spark
> UI using "kubectl port-forward".
>
>
>
> However, this spark UI contains currently running Spark application logs,
> we would like to maintain the 'completed' spark application logs as well.
> Could someone help us to setup 'Spark History server' on Kubernetes. Thanks!
>
>
>
> Best Regards,
>
> Lakshman Battini.
>
>


RE: Spark UI History server on Kubernetes

2019-01-23 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi Lakshman,

We’ve set these 2 properties to bringup spark history server

spark.history.fs.logDirectory 
spark.history.ui.port 

We’re writing the logs to HDFS. In order to write logs, we’re setting following 
properties while submitting the spark job
spark.eventLog.enabled true
spark.eventLog.dir 

Thanks and Regards,
Abhishek

From: Battini Lakshman 
Sent: Wednesday, January 23, 2019 1:55 PM
To: Rao, Abhishek (Nokia - IN/Bangalore) 
Subject: Re: Spark UI History server on Kubernetes

HI Abhishek,

Thank you for your response. Could you please let me know the properties you 
configured for bringing up History Server and its UI.

Also, are you writing the logs to any directory on persistent storage, if yes, 
could you let me know the changes you did in Spark to write logs to that 
directory. Thanks!

Best Regards,
Lakshman Battini.

On Tue, Jan 22, 2019 at 10:53 PM Rao, Abhishek (Nokia - IN/Bangalore) 
mailto:abhishek@nokia.com>> wrote:
Hi,

We’ve setup spark-history service (based on spark 2.4) on K8S. UI works 
perfectly fine when running on NodePort. We’re facing some issues when on 
ingress.
Please let us know what kind of inputs do you need?

Thanks and Regards,
Abhishek

From: Battini Lakshman 
mailto:battini.laksh...@gmail.com>>
Sent: Tuesday, January 22, 2019 6:02 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark UI History server on Kubernetes

Hello,

We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI 
using "kubectl port-forward".

However, this spark UI contains currently running Spark application logs, we 
would like to maintain the 'completed' spark application logs as well. Could 
someone help us to setup 'Spark History server' on Kubernetes. Thanks!

Best Regards,
Lakshman Battini.


RE: Spark UI History server on Kubernetes

2019-01-22 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi,

We’ve setup spark-history service (based on spark 2.4) on K8S. UI works 
perfectly fine when running on NodePort. We’re facing some issues when on 
ingress.
Please let us know what kind of inputs do you need?

Thanks and Regards,
Abhishek

From: Battini Lakshman 
Sent: Tuesday, January 22, 2019 6:02 PM
To: user@spark.apache.org
Subject: Spark UI History server on Kubernetes

Hello,

We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI 
using "kubectl port-forward".

However, this spark UI contains currently running Spark application logs, we 
would like to maintain the 'completed' spark application logs as well. Could 
someone help us to setup 'Spark History server' on Kubernetes. Thanks!

Best Regards,
Lakshman Battini.


Spark UI History server on Kubernetes

2019-01-22 Thread Battini Lakshman
Hello,

We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI
using "kubectl port-forward".

However, this spark UI contains currently running Spark application logs,
we would like to maintain the 'completed' spark application logs as well.
Could someone help us to setup 'Spark History server' on Kubernetes. Thanks!

Best Regards,
Lakshman Battini.


[Spark cluster standalone v2.4.0] - problems with reverse proxy functionnality regarding submitted applications in cluster mode and the spark history server ui

2019-01-03 Thread Cheikh_SOW
Hello,

I have many spark clusters in standalone mode with 3 nodes each. One of them
is in HA with 3 masters and 3 workers and everything regarding the HA is
working fine. The second one is not in HA mode and we have one master and 3
workers.

In both of them, I have configured the reverse proxy for the UIs and
everything is working fine except two things:

1 - When I submit an application in cluster mode and the driver is launched
on a worker different from the one from which I do the submit command, the
access to the application UI through the master UI in reverse proxy is
broken and we have this error : *HTTP ERROR 502 \n Problem accessing
/myCustomProxyBase/proxy/app-20181203110740-. Reason: \n \tBad Gateway*
The workers UIs and this one which list the running executors
(/myCustomProxyBase/app/?appId=app-20181218104130-0006) work very fine.
However, when the driver is launched on the same worker from which the
submit command is executed, the access to the application through the master
works fine and others uris too (I mean the problem disappear).

2 - When the reverse proxy is setted, I can't access anymore to the history
server UI (many js and css errors) in both clusters.

Really need help,

Thanks,
Cheikh SOW



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Spark cluster standalone v2.4.0] - problems with reverse proxy functionnality regarding submitted applications in cluster mode and the spark history server ui

2018-12-20 Thread Cheikh_SOW
Hello,

I have many spark clusters in standalone mode with 3 nodes each. One of them
is in HA with 3 masters and 3 workers and everything regarding the HA is
working fine. The second one is not in HA mode and we have one master and 3
workers.

In both of them, I have configured the reverse proxy for the UIs and
everything is working fine except two things:

1 - When I submit an application in cluster mode and the driver is launched
on a worker different from the one from which I do the submit command, the
access to the application UI through the master UI in reverse proxy is
broken and we have this error : *HTTP ERROR 502 \n Problem accessing
/myCustomProxyBase/proxy/app-20181203110740-. Reason: \n \tBad Gateway*
The workers UIs and this one which list the running executors
(/myCustomProxyBase/app/?appId=app-20181218104130-0006) work very fine.
However, when the driver is launched on the same worker from which the
submit command is executed, the access to the application through the master
works fine and others uris too (I mean the problem disappear).

2 - When the reverse proxy is setted, I can't access anymore to the history
server UI (many js and css errors) in both clusters.

Really need help,

Thanks,
Cheikh SOW



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Upgrading spark history server, no logs showing.

2018-11-27 Thread bbarks
I finally circled back and tinkered with this, eventually finding the
solution. It turned out to be HDFS permissions on the history files.

For whatever reason, our HDFS perms worked fine with spark 2.0.2 and 2.1.2,
but when we ran spark 2.3.0 it wouldn't load any history in the UI.

I found out the history files are written with perms 0770, which is
hard-coded in the spark source. I just had to chown the history directory to
the same group the history service user is in, then set the facl so the
group has default permissions on all files added to the directory.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Upgrading spark history server, no logs showing.

2018-07-12 Thread bbarks
Hi, 

We have multiple installations of spark installed on our clusters. They
reside in different directories which the jobs point to when they run.

For a couple of years now, we've run our history server off spark 2.0.2. We
have 2.1.2, 2.2.1 and 2.3.0 installed as well. I've tried upgrading to run
the server out of the 2.3.0 install. The UI loads, but will not show logs.

For fun, I then tried 2.2.1, same deal. However, when I ran 2.1.2, it works
(albeit with a JS error about missing data in some table cell or row).

Are there any special steps to upgrading the history server between spark
versions? I've combed over settings multiple times, it all seems fine.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



History server and non-HDFS filesystems

2017-11-17 Thread Paul Mackles
Hi - I had originally posted this as a bug (SPARK-22528) but given my
uncertainty, it was suggested that I send it to the mailing list instead...

We are using Azure Data Lake (ADL) to store our event logs. This worked
fine in 2.1.x, but in 2.2.0 the underlying files are no longer visible to
the history server - even though we are using the same service principal
that was used to write the logs. I tracked it down to this call in
"FSHistoryProvider" (which was added for v2.2.0):


SparkHadoopUtil.checkAccessPermission()


>From what I can tell, it is preemptively checking the permissions on the
files and skipping the ones which it thinks are not readable. The problem
is that its using a check that appears to be specific to HDFS and so even
though the files are definitely readable, it skips over them. Also,
"FSHistoryProvider"
is the only place this code is used.

I was able to workaround it by either:

* setting the permissions for the files on ADL to world readable

* or setting HADOOP_PROXY to the objectId of the Azure service principal
which owns file

Neither of these workarounds are acceptable for our environment. That said,
I am not sure how this should be addressed:

* Is this an issue with the Azure/Hadoop not complying with how the Hadoop
FileSystem interface/contract in some way?

* Is this an issue with "checkAccessPermission()" not really accounting for
all of the possible FileSystem implementations?

My gut tells me its the latter because the
SparkHadoopUtil.checkAccessPermission()
gets its "currentUser" info from outside of the FileSystem class and it
doesn't make sense to me that an instance of FileSystem would affect a
global context since there could be many FileSytem instances in a given
app.

That said, I know ADL is not heavily used at this time so I wonder if
anyone is seeing this with S3 as well? Maybe not since S3 permissions are
always reported as world-readable (I think) which causes
checkAccessPermission()
to succeed.

Any thoughts or feedback appreciated.

-- 
Thanks,
Paul


Heap Settings for History Server

2017-07-31 Thread N Sa
Hi folks,

I couldn't find much literature on this so I figured I could ask here.

Does anyone have experience in tuning the memory settings and interval
times of the Spark History Server?
Let's say I have 500 applications at 0.5 G each with a
*spark.history.fs.update.interval*  of 400s.
Is there a direct memory correlation that can help me set an optimum value?

Looking for some advice if anyone has tuned the History Server to render
large amounts of applications.

Thanks.
-- 
Regards,
Neelesh S. Salian


Re: Spark history server running on Mongo

2017-07-19 Thread Ivan Sadikov
Yes, you are absolutely right, though UI does not change often, and it
potentially allows to iterate faster, IMHO, which is why started working on
this. For me, it felt like this functionality could easily be outsourced to
a separate project.

And, as you pointed out, I did add some small fixes to UI, but still have
fair amount of them to add. It is all mentioned in repo readme, by the way.

Thanks for clearing things out.
Have a good day!

Ivan


On Thu, 20 Jul 2017 at 5:52 AM, Marcelo Vanzin  wrote:

> On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov 
> wrote:
> > Repository that I linked to does not require rebuilding Spark and could
> be
> > used with current distribution, which is preferable in my case.
>
> Fair enough, although that means that you're re-implementing the Spark
> UI, which makes that project have to constantly be modified to keep up
> with UI changes in Spark (or create its own UI and forget about what
> Spark does). Which is what Spree does too.
>
> In the long term I believe having these sort of enhancements in Spark
> itself would benefit more people.
>
> --
> Marcelo
>


Re: Spark history server running on Mongo

2017-07-19 Thread Marcelo Vanzin
On Tue, Jul 18, 2017 at 7:21 PM, Ivan Sadikov  wrote:
> Repository that I linked to does not require rebuilding Spark and could be
> used with current distribution, which is preferable in my case.

Fair enough, although that means that you're re-implementing the Spark
UI, which makes that project have to constantly be modified to keep up
with UI changes in Spark (or create its own UI and forget about what
Spark does). Which is what Spree does too.

In the long term I believe having these sort of enhancements in Spark
itself would benefit more people.

-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark history server running on Mongo

2017-07-18 Thread Ivan Sadikov
Hi Marcelo,

Thanks for the reference, again. I looked at your code - really great work!
I had to replace Spark distribution to use it though - could not figure out
how to build it separately.

Repository that I linked to does not require rebuilding Spark and could be
used with current distribution, which is preferable in my case.


Kind regards,

Ivan



On Wed, 19 Jul 2017 at 4:44 AM, Ivan Sadikov <ivan.sadi...@gmail.com> wrote:

> Thanks for JIRA ticket reference! Frankly, I was aware of this work, but
> didn't know that there was an API for storage implementation.
>
> Will try exploring that as well, thanks!
> On Wed, 19 Jul 2017 at 4:18 AM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> See SPARK-18085. That has much of the same goals re: SHS resource
>> usage, and also provides a (currently non-public) API where you could
>> just create a MongoDB implementation if you want.
>>
>> On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <ivan.sadi...@gmail.com>
>> wrote:
>> > Hello everyone!
>> >
>> > I have been working on Spark history server that uses MongoDB as a
>> datastore
>> > for processed events to iterate on idea that Spree project uses for
>> Spark
>> > UI. Project was originally designed to improve on standalone history
>> server
>> > with reduced memory footprint.
>> >
>> > Project lives here: https://github.com/lightcopy/history-server
>> >
>> > These are just very early days of the project, sort of pre-alpha (some
>> > features are missing, and metrics in some failed jobs cases are
>> > questionable). Code is being tested on several 8gb and 2gb logs and
>> aims to
>> > lower resource usage since we run history server together with several
>> other
>> > systems.
>> >
>> > Would greatly appreciate any feedback on repository (issues/pull
>> > requests/suggestions/etc.). Thanks a lot!
>> >
>> >
>> > Cheers,
>> >
>> > Ivan
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>


Re: Spark history server running on Mongo

2017-07-18 Thread Ivan Sadikov
Thanks for JIRA ticket reference! Frankly, I was aware of this work, but
didn't know that there was an API for storage implementation.

Will try exploring that as well, thanks!
On Wed, 19 Jul 2017 at 4:18 AM, Marcelo Vanzin <van...@cloudera.com> wrote:

> See SPARK-18085. That has much of the same goals re: SHS resource
> usage, and also provides a (currently non-public) API where you could
> just create a MongoDB implementation if you want.
>
> On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <ivan.sadi...@gmail.com>
> wrote:
> > Hello everyone!
> >
> > I have been working on Spark history server that uses MongoDB as a
> datastore
> > for processed events to iterate on idea that Spree project uses for Spark
> > UI. Project was originally designed to improve on standalone history
> server
> > with reduced memory footprint.
> >
> > Project lives here: https://github.com/lightcopy/history-server
> >
> > These are just very early days of the project, sort of pre-alpha (some
> > features are missing, and metrics in some failed jobs cases are
> > questionable). Code is being tested on several 8gb and 2gb logs and aims
> to
> > lower resource usage since we run history server together with several
> other
> > systems.
> >
> > Would greatly appreciate any feedback on repository (issues/pull
> > requests/suggestions/etc.). Thanks a lot!
> >
> >
> > Cheers,
> >
> > Ivan
> >
>
>
>
> --
> Marcelo
>


Re: Spark history server running on Mongo

2017-07-18 Thread Marcelo Vanzin
See SPARK-18085. That has much of the same goals re: SHS resource
usage, and also provides a (currently non-public) API where you could
just create a MongoDB implementation if you want.

On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <ivan.sadi...@gmail.com> wrote:
> Hello everyone!
>
> I have been working on Spark history server that uses MongoDB as a datastore
> for processed events to iterate on idea that Spree project uses for Spark
> UI. Project was originally designed to improve on standalone history server
> with reduced memory footprint.
>
> Project lives here: https://github.com/lightcopy/history-server
>
> These are just very early days of the project, sort of pre-alpha (some
> features are missing, and metrics in some failed jobs cases are
> questionable). Code is being tested on several 8gb and 2gb logs and aims to
> lower resource usage since we run history server together with several other
> systems.
>
> Would greatly appreciate any feedback on repository (issues/pull
> requests/suggestions/etc.). Thanks a lot!
>
>
> Cheers,
>
> Ivan
>



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark history server running on Mongo

2017-07-18 Thread Ivan Sadikov
Hello everyone!

I have been working on Spark history server that uses MongoDB as a
datastore for processed events to iterate on idea that Spree project uses
for Spark UI. Project was originally designed to improve on standalone
history server with reduced memory footprint.

Project lives here: https://github.com/lightcopy/history-server

These are just very early days of the project, sort of pre-alpha (some
features are missing, and metrics in some failed jobs cases are
questionable). Code is being tested on several 8gb and 2gb logs and aims to
lower resource usage since we run history server together with several
other systems.

Would greatly appreciate any feedback on repository (issues/pull
requests/suggestions/etc.). Thanks a lot!


Cheers,

Ivan


Re: Why spark history server does not show RDD even if it is persisted?

2017-03-01 Thread Parag Chaudhari
Thanks!



*Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
*Mobile : (213)-572-7858*
*Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
<http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*


On Tue, Feb 28, 2017 at 12:53 PM, Shixiong(Ryan) Zhu <
shixi...@databricks.com> wrote:

> The REST APIs are not just for Spark history server. When an application
> is running, you can use the REST APIs to talk to Spark UI HTTP server as
> well.
>
> On Tue, Feb 28, 2017 at 10:46 AM, Parag Chaudhari <paragp...@gmail.com>
> wrote:
>
>> ping...
>>
>>
>>
>> *Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
>> *Mobile : (213)-572-7858 <(213)%20572-7858>*
>> *Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
>> <http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*
>>
>>
>> On Wed, Feb 22, 2017 at 7:54 PM, Parag Chaudhari <paragp...@gmail.com>
>> wrote:
>>
>>> Thanks!
>>>
>>> If spark does not log these events in event log then why spark history
>>> server provides an API to get RDD information?
>>>
>>> From the documentation,
>>>
>>> /applications/[app-id]/storage/rdd   A list of stored RDDs for the
>>> given application.
>>>
>>> /applications/[app-id]/storage/rdd/[rdd-id]   Details for the storage
>>> status of a given RDD.
>>>
>>>
>>>
>>>
>>> *Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
>>> *Mobile : (213)-572-7858 <(213)%20572-7858>*
>>> *Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
>>> <http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*
>>>
>>>
>>> On Wed, Feb 22, 2017 at 7:44 PM, Saisai Shao <sai.sai.s...@gmail.com>
>>> wrote:
>>>
>>>> It is too verbose, and will significantly increase the size event log.
>>>>
>>>> Here is the comment in the code:
>>>>
>>>> // No-op because logging every update would be overkill
>>>>> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>>>>>
>>>>>
>>>> On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari <paragp...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks a lot the information!
>>>>>
>>>>> Is there any reason why EventLoggingListener ignore this event?
>>>>>
>>>>> *Thanks,*
>>>>>
>>>>>
>>>>> *​Parag​*
>>>>>
>>>>> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>>>>>> will not be written into event-log, I think that's why you cannot get 
>>>>>> such
>>>>>> info in history server.
>>>>>>
>>>>>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>>>>>
>>>>>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>>>>>> myrdd.setName("test")
>>>>>>> myrdd.cache
>>>>>>> myrdd.collect
>>>>>>>
>>>>>>> But I am not able to see any RDD info in "storage" tab in spark
>>>>>>> history server.
>>>>>>>
>>>>>>> I looked at this
>>>>>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>>>>>> but it is not helping as I have exact similar program mentioned there. 
>>>>>>> Can
>>>>>>> anyone help?
>>>>>>>
>>>>>>>
>>>>>>> *Thanks,*
>>>>>>>
>>>>>>> *​Parag​*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-28 Thread Shixiong(Ryan) Zhu
The REST APIs are not just for Spark history server. When an application is
running, you can use the REST APIs to talk to Spark UI HTTP server as well.

On Tue, Feb 28, 2017 at 10:46 AM, Parag Chaudhari <paragp...@gmail.com>
wrote:

> ping...
>
>
>
> *Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
> *Mobile : (213)-572-7858 <(213)%20572-7858>*
> *Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
> <http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*
>
>
> On Wed, Feb 22, 2017 at 7:54 PM, Parag Chaudhari <paragp...@gmail.com>
> wrote:
>
>> Thanks!
>>
>> If spark does not log these events in event log then why spark history
>> server provides an API to get RDD information?
>>
>> From the documentation,
>>
>> /applications/[app-id]/storage/rdd   A list of stored RDDs for the given
>> application.
>>
>> /applications/[app-id]/storage/rdd/[rdd-id]   Details for the storage
>> status of a given RDD.
>>
>>
>>
>>
>> *Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
>> *Mobile : (213)-572-7858 <(213)%20572-7858>*
>> *Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
>> <http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*
>>
>>
>> On Wed, Feb 22, 2017 at 7:44 PM, Saisai Shao <sai.sai.s...@gmail.com>
>> wrote:
>>
>>> It is too verbose, and will significantly increase the size event log.
>>>
>>> Here is the comment in the code:
>>>
>>> // No-op because logging every update would be overkill
>>>> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>>>>
>>>>
>>> On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari <paragp...@gmail.com>
>>> wrote:
>>>
>>>> Thanks a lot the information!
>>>>
>>>> Is there any reason why EventLoggingListener ignore this event?
>>>>
>>>> *Thanks,*
>>>>
>>>>
>>>> *​Parag​*
>>>>
>>>> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
>>>> wrote:
>>>>
>>>>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>>>>> will not be written into event-log, I think that's why you cannot get such
>>>>> info in history server.
>>>>>
>>>>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>>>>
>>>>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>>>>> myrdd.setName("test")
>>>>>> myrdd.cache
>>>>>> myrdd.collect
>>>>>>
>>>>>> But I am not able to see any RDD info in "storage" tab in spark
>>>>>> history server.
>>>>>>
>>>>>> I looked at this
>>>>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>>>>> but it is not helping as I have exact similar program mentioned there. 
>>>>>> Can
>>>>>> anyone help?
>>>>>>
>>>>>>
>>>>>> *Thanks,*
>>>>>>
>>>>>> *​Parag​*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-28 Thread Parag Chaudhari
ping...



*Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
*Mobile : (213)-572-7858*
*Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
<http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*


On Wed, Feb 22, 2017 at 7:54 PM, Parag Chaudhari <paragp...@gmail.com>
wrote:

> Thanks!
>
> If spark does not log these events in event log then why spark history
> server provides an API to get RDD information?
>
> From the documentation,
>
> /applications/[app-id]/storage/rdd   A list of stored RDDs for the given
> application.
>
> /applications/[app-id]/storage/rdd/[rdd-id]   Details for the storage
> status of a given RDD.
>
>
>
>
> *Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
> *Mobile : (213)-572-7858 <(213)%20572-7858>*
> *Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
> <http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*
>
>
> On Wed, Feb 22, 2017 at 7:44 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> It is too verbose, and will significantly increase the size event log.
>>
>> Here is the comment in the code:
>>
>> // No-op because logging every update would be overkill
>>> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>>>
>>>
>> On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari <paragp...@gmail.com>
>> wrote:
>>
>>> Thanks a lot the information!
>>>
>>> Is there any reason why EventLoggingListener ignore this event?
>>>
>>> *Thanks,*
>>>
>>>
>>> *​Parag​*
>>>
>>> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
>>> wrote:
>>>
>>>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>>>> will not be written into event-log, I think that's why you cannot get such
>>>> info in history server.
>>>>
>>>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>>>
>>>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>>>> myrdd.setName("test")
>>>>> myrdd.cache
>>>>> myrdd.collect
>>>>>
>>>>> But I am not able to see any RDD info in "storage" tab in spark
>>>>> history server.
>>>>>
>>>>> I looked at this
>>>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>>>> but it is not helping as I have exact similar program mentioned there. Can
>>>>> anyone help?
>>>>>
>>>>>
>>>>> *Thanks,*
>>>>>
>>>>> *​Parag​*
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari
Thanks!

If spark does not log these events in event log then why spark history
server provides an API to get RDD information?

>From the documentation,

/applications/[app-id]/storage/rdd   A list of stored RDDs for the given
application.

/applications/[app-id]/storage/rdd/[rdd-id]   Details for the storage
status of a given RDD.




*Thanks,Parag Chaudhari,**USC Alumnus (Fight On!)*
*Mobile : (213)-572-7858*
*Profile: http://www.linkedin.com/pub/parag-chaudhari/28/a55/254
<http://www.linkedin.com/pub/parag-chaudhari/28/a55/254>*


On Wed, Feb 22, 2017 at 7:44 PM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> It is too verbose, and will significantly increase the size event log.
>
> Here is the comment in the code:
>
> // No-op because logging every update would be overkill
>> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>>
>>
> On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari <paragp...@gmail.com>
> wrote:
>
>> Thanks a lot the information!
>>
>> Is there any reason why EventLoggingListener ignore this event?
>>
>> *Thanks,*
>>
>>
>> *​Parag​*
>>
>> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
>> wrote:
>>
>>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>>> will not be written into event-log, I think that's why you cannot get such
>>> info in history server.
>>>
>>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>>
>>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>>> myrdd.setName("test")
>>>> myrdd.cache
>>>> myrdd.collect
>>>>
>>>> But I am not able to see any RDD info in "storage" tab in spark history
>>>> server.
>>>>
>>>> I looked at this
>>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>>> but it is not helping as I have exact similar program mentioned there. Can
>>>> anyone help?
>>>>
>>>>
>>>> *Thanks,*
>>>>
>>>> *​Parag​*
>>>>
>>>
>>>
>>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Saisai Shao
It is too verbose, and will significantly increase the size event log.

Here is the comment in the code:

// No-op because logging every update would be overkill
> override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {}
>
>
On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari <paragp...@gmail.com>
wrote:

> Thanks a lot the information!
>
> Is there any reason why EventLoggingListener ignore this event?
>
> *Thanks,*
>
>
> *​Parag​*
>
> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it
>> will not be written into event-log, I think that's why you cannot get such
>> info in history server.
>>
>> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am running spark shell in spark version 2.0.2. Here is my program,
>>>
>>> var myrdd = sc.parallelize(Array.range(1, 10))
>>> myrdd.setName("test")
>>> myrdd.cache
>>> myrdd.collect
>>>
>>> But I am not able to see any RDD info in "storage" tab in spark history
>>> server.
>>>
>>> I looked at this
>>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>>> but it is not helping as I have exact similar program mentioned there. Can
>>> anyone help?
>>>
>>>
>>> *Thanks,*
>>>
>>> *​Parag​*
>>>
>>
>>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari
Thanks a lot the information!

Is there any reason why EventLoggingListener ignore this event?

*Thanks,*


*​Parag​*

On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com> wrote:

> AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it will
> not be written into event-log, I think that's why you cannot get such info
> in history server.
>
> On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am running spark shell in spark version 2.0.2. Here is my program,
>>
>> var myrdd = sc.parallelize(Array.range(1, 10))
>> myrdd.setName("test")
>> myrdd.cache
>> myrdd.collect
>>
>> But I am not able to see any RDD info in "storage" tab in spark history
>> server.
>>
>> I looked at this
>> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
>> but it is not helping as I have exact similar program mentioned there. Can
>> anyone help?
>>
>>
>> *Thanks,*
>>
>> *​Parag​*
>>
>
>


Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Saisai Shao
AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it will
not be written into event-log, I think that's why you cannot get such info
in history server.

On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari <paragp...@gmail.com>
wrote:

> Hi,
>
> I am running spark shell in spark version 2.0.2. Here is my program,
>
> var myrdd = sc.parallelize(Array.range(1, 10))
> myrdd.setName("test")
> myrdd.cache
> myrdd.collect
>
> But I am not able to see any RDD info in "storage" tab in spark history
> server.
>
> I looked at this
> <https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
> but it is not helping as I have exact similar program mentioned there. Can
> anyone help?
>
>
> *Thanks,*
>
> *​Parag​*
>


Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari
Hi,

I am running spark shell in spark version 2.0.2. Here is my program,

var myrdd = sc.parallelize(Array.range(1, 10))
myrdd.setName("test")
myrdd.cache
myrdd.collect

But I am not able to see any RDD info in "storage" tab in spark history
server.

I looked at this
<https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html>
but it is not helping as I have exact similar program mentioned there. Can
anyone help?


*Thanks,*

*​Parag​*


Re: how can I set the log configuration file for spark history server ?

2016-12-09 Thread Marcelo Vanzin
(-dev)

Just configure your log4j.properties in $SPARK_HOME/conf (or set a
custom $SPARK_CONF_DIR for the history server).

On Thu, Dec 8, 2016 at 7:20 PM, John Fang <xiaojian@alibaba-inc.com> wrote:
> ./start-history-server.sh
> starting org.apache.spark.deploy.history.HistoryServer, logging to
> /home/admin/koala/data/versions/0/SPARK/2.0.2/spark-2.0.2-bin-hadoop2.6/logs/spark-admin-org.apache.spark.deploy.history.HistoryServer-1-v069166214.sqa.zmf.out
>
> Then the history will print all log to the XXX.sqa.zmf.out, so i can't limit
> the file max size.  I want limit the size of the log file



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: how can I set the log configuration file for spark history server ?

2016-12-08 Thread Don Drake
You can update $SPARK_HOME/spark-env.sh by setting the environment
variable SPARK_HISTORY_OPTS.

See
http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options
for options (spark.history.fs.logDirectory) you can set.

There is log rotation built in (by time, not size) to the history server,
you need to enable/configure it.

Hope that helps.

-Don

On Thu, Dec 8, 2016 at 9:20 PM, John Fang <xiaojian@alibaba-inc.com>
wrote:

> ./start-history-server.sh
> starting org.apache.spark.deploy.history.HistoryServer,
> logging to /home/admin/koala/data/versions/0/SPARK/2.0.2/
> spark-2.0.2-bin-hadoop2.6/logs/spark-admin-org.apache.
> spark.deploy.history.HistoryServer-1-v069166214.sqa.zmf.out
>
> Then the history will print all log to the XXX.sqa.zmf.out, so i can't
> limit the file max size.  I want limit the size of the log file
>



-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143


how can I set the log configuration file for spark history server ?

2016-12-08 Thread John Fang
./start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to 
/home/admin/koala/data/versions/0/SPARK/2.0.2/spark-2.0.2-bin-hadoop2.6/logs/spark-admin-org.apache.spark.deploy.history.HistoryServer-1-v069166214.sqa.zmf.out
Then the history will print all log to the XXX.sqa.zmf.out, so i can't limit 
the file max size.  I want limit the size of the log file

Re: Passing Custom App Id for consumption in History Server

2016-09-03 Thread ayan guha
How about this:

1. You create a primary key in your custom system.
2. Schedule the job with custom primary name as the job name.
3. After setting up spark context (inside the job) get the application id.
Then save the mapping of App Name & AppId from spark job to your custom
database, through some web service.



On Sun, Sep 4, 2016 at 12:30 AM, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:

> Default implementation is to add milliseconds. For mesos it is
> framework-id. If you are using mesos, you can assume that your framework id
> used to register your app is same as app-id.
> As you said, you have a system application to schedule spark jobs, you can
> keep track of framework-ids submitted by your application, you can use the
> same info.
>
> On Fri, Sep 2, 2016 at 6:29 PM, Amit Shanker <amit.shank...@gmail.com>
> wrote:
>
>> Currently Spark sets current time in Milliseconds as the app Id. Is there
>> a way one can pass in the app id to the spark job, so that it uses this
>> provided app id instead of generating one using time?
>>
>> Lets take the following scenario : I have a system application which
>> schedules spark jobs, and records the metadata for that job (say job
>> params, cores, etc). In this system application, I want to link every job
>> with its corresponding UI (history server). The only way I can do this is
>> if I have the app Id of that job stored in this system application. And the
>> only way one can get the app Id is by using the
>> SparkContext.getApplicationId() function - which needs to be run from
>> inside the job. So, this make it difficult to convey this piece of
>> information from spark to a system outside spark.
>>
>> Thanks,
>> Amit Shanker
>>
>
>


-- 
Best Regards,
Ayan Guha


Re: Passing Custom App Id for consumption in History Server

2016-09-03 Thread Raghavendra Pandey
Default implementation is to add milliseconds. For mesos it is
framework-id. If you are using mesos, you can assume that your framework id
used to register your app is same as app-id.
As you said, you have a system application to schedule spark jobs, you can
keep track of framework-ids submitted by your application, you can use the
same info.

On Fri, Sep 2, 2016 at 6:29 PM, Amit Shanker <amit.shank...@gmail.com>
wrote:

> Currently Spark sets current time in Milliseconds as the app Id. Is there
> a way one can pass in the app id to the spark job, so that it uses this
> provided app id instead of generating one using time?
>
> Lets take the following scenario : I have a system application which
> schedules spark jobs, and records the metadata for that job (say job
> params, cores, etc). In this system application, I want to link every job
> with its corresponding UI (history server). The only way I can do this is
> if I have the app Id of that job stored in this system application. And the
> only way one can get the app Id is by using the
> SparkContext.getApplicationId() function - which needs to be run from
> inside the job. So, this make it difficult to convey this piece of
> information from spark to a system outside spark.
>
> Thanks,
> Amit Shanker
>


Passing Custom App Id for consumption in History Server

2016-09-02 Thread Amit Shanker
Currently Spark sets current time in Milliseconds as the app Id. Is there a
way one can pass in the app id to the spark job, so that it uses this
provided app id instead of generating one using time?

Lets take the following scenario : I have a system application which
schedules spark jobs, and records the metadata for that job (say job
params, cores, etc). In this system application, I want to link every job
with its corresponding UI (history server). The only way I can do this is
if I have the app Id of that job stored in this system application. And the
only way one can get the app Id is by using the
SparkContext.getApplicationId() function - which needs to be run from
inside the job. So, this make it difficult to convey this piece of
information from spark to a system outside spark.

Thanks,
Amit Shanker


Re: Spark 2.0 History Server Storage

2016-08-02 Thread Andrei Ivanov
   1. SPARK-16859 
submitted


On Tue, Aug 2, 2016 at 9:07 PM, Andrei Ivanov  wrote:

> OK, answering myself - this is broken since 1.6.2 by SPARK-13845
> 
>
> On Tue, Aug 2, 2016 at 12:10 AM, Andrei Ivanov 
> wrote:
>
>> Hi all,
>>
>> I've just tried upgrading Spark to 2.0 and so far it looks generally good.
>>
>> But there is at least one issue I see right away - jon histories are
>> missing storage information (persisted RRDs).
>> This info is also missing from pre upgrade jobs.
>>
>> Does anyone have a clue what can be wrong?
>>
>> Thanks, Andrei Ivanov.
>>
>
>


Re: Spark 2.0 History Server Storage

2016-08-02 Thread Andrei Ivanov
OK, answering myself - this is broken since 1.6.2 by SPARK-13845


On Tue, Aug 2, 2016 at 12:10 AM, Andrei Ivanov  wrote:

> Hi all,
>
> I've just tried upgrading Spark to 2.0 and so far it looks generally good.
>
> But there is at least one issue I see right away - jon histories are
> missing storage information (persisted RRDs).
> This info is also missing from pre upgrade jobs.
>
> Does anyone have a clue what can be wrong?
>
> Thanks, Andrei Ivanov.
>


Spark 2.0 History Server Storage

2016-08-01 Thread Andrei Ivanov
Hi all,

I've just tried upgrading Spark to 2.0 and so far it looks generally good.

But there is at least one issue I see right away - jon histories are
missing storage information (persisted RRDs).
This info is also missing from pre upgrade jobs.

Does anyone have a clue what can be wrong?

Thanks, Andrei Ivanov.


Re: Redirect from yarn to spark history server

2016-05-02 Thread Marcelo Vanzin
See http://spark.apache.org/docs/latest/running-on-yarn.html,
especially the parts that talk about
spark.yarn.historyServer.address.

On Mon, May 2, 2016 at 2:14 PM, satish saley <satishsale...@gmail.com> wrote:
>
>
> Hello,
>
> I am running pyspark job using yarn-cluster mode. I can see spark job in
> yarn but I am able to go from any "log history" link from yarn to spark
> history server. How would I keep track of yarn log and its corresponding log
> in spark history server? Is there any setting in yarn/spark that let me
> redirect to spark history server from yarn?
>
> Best,
> Satish



-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Redirect from yarn to spark history server

2016-05-02 Thread satish saley
Hello,

I am running pyspark job using yarn-cluster mode. I can see spark job in
yarn but I am able to go from any "log history" link from yarn to spark
history server. How would I keep track of yarn log and its corresponding
log in spark history server? Is there any setting in yarn/spark that let me
redirect to spark history server from yarn?

Best,
Satish


Problem with History Server

2016-04-13 Thread alvarobrandon
Hello:

I'm using the history server to keep track of the applications I run in my
cluster. I'm using Spark with YARN.
When I run on application it finishes correctly even YARN says that it
finished. This is the result of the YARN Resource Manager API

{u'app': [{u'runningContainers': -1, u'allocatedVCores': -1, u'clusterId':
1460540049690, u'amContainerLogs':
u'http://parapide-2.rennes.grid5000.fr:8042/node/containerlogs/container_1460540049690_0001_01_01/abrandon',
u'id': u'*application_1460540049690_0001*', u'preemptedResourceMB': 0,
u'finishedTime': 1460550170085, u'numAMContainerPreempted': 0, u'user':
u'abrandon', u'preemptedResourceVCores': 0, u'startedTime': 1460548211207,
u'elapsedTime': 1958878, u'state': u'FINISHED',
u'numNonAMContainerPreempted': 0, u'progress': 100.0, u'trackingUI':
u'History', u'trackingUrl':
u'http://paranoia-1.rennes.grid5000.fr:8088/proxy/application_1460540049690_0001/A',
u'allocatedMB': -1, u'amHostHttpAddress':
u'parapide-2.rennes.grid5000.fr:8042', u'memorySeconds': 37936274,
u'applicationTags': u'', u'name': u'KMeans', u'queue': u'default',
u'vcoreSeconds': 13651, u'applicationType': u'SPARK', u'diagnostics': u'',
u'finalStatus': u'*SUCCEEDED*'}

However when I query the SPARK UI 

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26777/Screen_Shot_2016-04-13_at_14.png>
 

You can see that for Job ID 2 no tasks have run and I can't get information
about them. Is this some kind of bug?

Thanks for your help as always



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-History-Server-tp26777.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: History Server Refresh?

2016-04-12 Thread Miles Crawford
It is completed apps that are not showing up. I'm fine with incomplete apps
not appearing.

On Tue, Apr 12, 2016 at 6:43 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 12 Apr 2016, at 00:21, Miles Crawford <mil...@allenai.org> wrote:
>
> Hey there. I have my spark applications set up to write their event logs
> into S3 - this is super useful for ephemeral clusters, I can have
> persistent history even though my hosts go away.
>
> A history server is set up to view this s3 location, and that works fine
> too - at least on startup.
>
> The problem is that the history server doesn't seem to notice new logs
> arriving into the S3 bucket.  Any idea how I can get it to scan the folder
> for new files?
>
> Thanks,
> -miles
>
>
> s3 isn't a real filesystem, and apps writing to it don't have any data
> written until one of
>  -the output stream is close()'d. This happens at the end of the app
>  -the file is set up to be partitioned and a partition size is crossed
>
> Until either of those conditions are met, the history server isn't going
> to see anything.
>
> If you are going to use s3 as the dest, and you want to see incomplete
> apps, then you'll need to configure the spark job to have smaller partition
> size (64? 128? MB).
>
> If it's completed apps that aren't being seen by the HS, then that's a
> bug, though if its against s3 only, likely to be something related to
> directory listings
>


Re: History Server Refresh?

2016-04-12 Thread Steve Loughran

On 12 Apr 2016, at 00:21, Miles Crawford 
<mil...@allenai.org<mailto:mil...@allenai.org>> wrote:

Hey there. I have my spark applications set up to write their event logs into 
S3 - this is super useful for ephemeral clusters, I can have persistent history 
even though my hosts go away.

A history server is set up to view this s3 location, and that works fine too - 
at least on startup.

The problem is that the history server doesn't seem to notice new logs arriving 
into the S3 bucket.  Any idea how I can get it to scan the folder for new files?

Thanks,
-miles

s3 isn't a real filesystem, and apps writing to it don't have any data written 
until one of
 -the output stream is close()'d. This happens at the end of the app
 -the file is set up to be partitioned and a partition size is crossed

Until either of those conditions are met, the history server isn't going to see 
anything.

If you are going to use s3 as the dest, and you want to see incomplete apps, 
then you'll need to configure the spark job to have smaller partition size (64? 
128? MB).

If it's completed apps that aren't being seen by the HS, then that's a bug, 
though if its against s3 only, likely to be something related to directory 
listings


Check if spark master/history server is running via Java

2016-04-12 Thread Mihir Monani
Hi,

How to check if spark master /history server is running on node? is there
any command for it?

I would like to accomplish it with java if possible.

Thanks,
Mihir Monani


History Server Refresh?

2016-04-11 Thread Miles Crawford
Hey there. I have my spark applications set up to write their event logs
into S3 - this is super useful for ephemeral clusters, I can have
persistent history even though my hosts go away.

A history server is set up to view this s3 location, and that works fine
too - at least on startup.

The problem is that the history server doesn't seem to notice new logs
arriving into the S3 bucket.  Any idea how I can get it to scan the folder
for new files?

Thanks,
-miles


Documentation for "hidden" RESTful API for submitting jobs (not history server)

2016-03-14 Thread Hyukjin Kwon
Hi all,


While googling Spark, I accidentally found a RESTful API existing in Spark
for submitting jobs.

The link is here, http://arturmkrtchyan.com/apache-spark-hidden-rest-api

As Josh said, I can see the history of this RESTful API,
https://issues.apache.org/jira/browse/SPARK-5388 and also good
documentation here, as a PDF,
https://issues.apache.org/jira/secure/attachment/12696651/stable-spark-submit-in-standalone-mode-2-4-15.pdf
.
​

My question is, I cannot find anything about this except for history stuff
in the Spark web site.

I tried to search JIRAs but also could not find anything related with the
documentation for this.

Would it be great if users know how to use this?

Is this already concerned? or is there some documents for this in the
web-site?

Please give me some feedback.


Thanks!


Re: Spark History Server NOT showing Jobs with Hortonworks

2016-02-19 Thread Steve Loughran

this is set up to save history to the timeline service, something which works 
provided the applications are all set up to publish there too.

On 18 Feb 2016, at 22:22, Sutanu Das <sd2...@att.com<mailto:sd2...@att.com>> 
wrote:

Hi Community,

Challenged with Spark issues with Hortonworks  (HDP 2.3.2_Spark 1.4.1) – The 
Spark History Server is NOT showing the Spark Running Jobs in Local Mode

The local-host:4040/app/v1 is ALSO not working

How can I look at my local Spark job?


# Generated by Apache Ambari. Fri Feb  5 00:37:06 2016

spark.history.kerberos.keytab none
spark.history.kerberos.principal none


this tells the history server to use ATS
spark.history.provider org.apache.spark.deploy.yarn.history.YarnHistoryProvider


spark.history.ui.port 18080
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 2048
spark.yarn.executor.memoryOverhead 2048
spark.yarn.historyServer.address 
has-dal-0001.corp.wayport.net<http://has-dal-0001.corp.wayport.net/>:18080
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000

this says: publish via it
spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.yarn.submit.file.replication 3



There's some asynchronous publishing, so things don't appear immediately as the 
app starts, and the updates can take a bit to trickle out, but things look set 
up right to work both ways.


I'll email you off the list and see if I can help track down what's happening

-steve





Re: Spark History Server NOT showing Jobs with Hortonworks

2016-02-18 Thread Divya Gehlot
Hi Sutanu ,

When you run your spark shell
you would  see below lines in your console

16/02/18 21:43:53 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4041
16/02/18 21:43:53 INFO Utils: Successfully started service 'SparkUI' on
port 4041.
16/02/18 21:43:54 INFO SparkUI: Started SparkUI at http://xx.xx.xx.xxx:4041

As In my case instead of default port the UI started at 4041 port .

Hope this helps.

Thanks,
Divya



On 19 February 2016 at 07:09, Mich Talebzadeh <m...@peridale.co.uk> wrote:

> Is 4040 port used in your host? It should be default
>
>
>
> Example
>
>
>
> *netstat -plten|grep 4040*
>
>
>
> tcp0  0 :::4040
> :::*LISTEN  1009   42748209   *22778*/java
>
>
>
> *ps -ef|grep 22778*
>
>
>
> hduser   22778 22770  0 08:34 pts/100:01:18 /usr/java/latest/bin/java
> -cp
> /home/hduser/jars/jconn4.jar:/home/hduser/jars/ojdbc6.jar:/usr/lib/spark-1.5.2-bin-hadoop2.6/conf/:/usr/lib/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/hduser/hadoop-2.6.0/etc/hadoop/
> -Dscala.usejavacp=true -Xms1G -Xmx1G -XX:MaxPermSize=256m
> org.apache.spark.deploy.SparkSubmit --master spark://50.140.197.217:7077
> --class org.apache.spark.repl.Main --name Spark shell spark-shell
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>
> *From:* Sutanu Das [mailto:sd2...@att.com]
> *Sent:* 18 February 2016 22:58
> *To:* Mich Talebzadeh <m...@peridale.co.uk>; user@spark.apache.org
>
> *Subject:* RE: Spark History Server NOT showing Jobs with Hortonworks
>
>
>
> Hi Mich, Community - Do I need to specify it in the properties file in my
> spark-submit ?
>
>
>
> *From:* Mich Talebzadeh [mailto:m...@peridale.co.uk <m...@peridale.co.uk>]
>
> *Sent:* Thursday, February 18, 2016 4:28 PM
> *To:* Sutanu Das; user@spark.apache.org
> *Subject:* RE: Spark History Server NOT showing Jobs with Hortonworks
>
>
>
> The jobs are normally shown under :4040/jobs/ in a normal set up
> not using any vendor’s flavoiur
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>
> *From:* Sutanu Das [mailto:sd2...@att.com <sd2...@att.com>]
> *Sent:* 18 February 2016 22:22
> *To:* user@spark.apache.org
> *Subject:* Spark History Server NOT showing Jobs with Hortonworks
>
>
>
> Hi Community,
>
>
>
> Challenged with Spark issues with *Hortonworks*  (HDP 2.3.2_Spark 1.4.1)
> – The Spark History Server is NOT showing the Spark Running Jobs in Local
> Mode
>
>
>
> The local-host:4040/app/v1 is ALSO not working
>
>
>
> How can I look at my local Spark job?
>
>
>
>
>
> # Generated by Apache Ambari. Fri Feb  5 00:37:06 2016
>
>
>
> spark.history.kerberos.keytab none
>
> spark.history.kerberos.principal none
>
> spark.history.provider
> org.

RE: Spark History Server NOT showing Jobs with Hortonworks

2016-02-18 Thread Mich Talebzadeh
Is 4040 port used in your host? It should be default

 

Example

 

netstat -plten|grep 4040

 

tcp0  0 :::4040 :::*
LISTEN  1009   42748209   22778/java

 

ps -ef|grep 22778

 

hduser   22778 22770  0 08:34 pts/100:01:18 /usr/java/latest/bin/java
-cp
/home/hduser/jars/jconn4.jar:/home/hduser/jars/ojdbc6.jar:/usr/lib/spark-1.5
.2-bin-hadoop2.6/conf/:/usr/lib/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly
-1.5.2-hadoop2.6.0.jar:/usr/lib/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-co
re-3.2.10.jar:/usr/lib/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2
.6.jar:/usr/lib/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/h
ome/hduser/hadoop-2.6.0/etc/hadoop/ -Dscala.usejavacp=true -Xms1G -Xmx1G
-XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master
spark://50.140.197.217:7077 --class org.apache.spark.repl.Main --name Spark
shell spark-shell

 

Dr Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.

 

 

From: Sutanu Das [mailto:sd2...@att.com] 
Sent: 18 February 2016 22:58
To: Mich Talebzadeh <m...@peridale.co.uk>; user@spark.apache.org
Subject: RE: Spark History Server NOT showing Jobs with Hortonworks

 

Hi Mich, Community - Do I need to specify it in the properties file in my
spark-submit ?

 

From: Mich Talebzadeh [mailto:m...@peridale.co.uk] 
Sent: Thursday, February 18, 2016 4:28 PM
To: Sutanu Das; user@spark.apache.org <mailto:user@spark.apache.org> 
Subject: RE: Spark History Server NOT showing Jobs with Hortonworks

 

The jobs are normally shown under :4040/jobs/ in a normal set up
not using any vendor's flavoiur

 

Dr Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.

 

 

From: Sutanu Das [mailto:sd2...@att.com] 
Sent: 18 February 2016 22:22
To: user@spark.apache.org <mailto:user@spark.apache.org> 
Subject: Spark History Server NOT showing Jobs with Hortonworks

 

Hi Community,

 

Challenged with Spark issues with Hortonworks  (HDP 2.3.2_Spark 1.4.1) - The
Spark History Server is NOT showing the Spark Running Jobs in Local Mode 

 

The local-host:4040/app/v1 is ALSO not working

 

How can I look at my local Spark job?

 

 

# Generated by Apache Ambari. Fri Feb  5 00:37:06 2016



spark.history.kerberos.keytab none

spark.history.kerberos.principal none

spark.history.provider
org.apache.spark.deploy.yarn.history.YarnHistoryProvider

spark.history.ui.port 18080

spark.yarn.containerLauncherMaxThreads 25

spark.yarn.driver.memoryOverhead 2048

spark.yarn.executor.memoryOverhead 2048

spark.yarn.historyServer.address has-dal-0001.corp.wayport.net:18080

spark.yarn.max.executor.failures 3

spark.yarn.preserve.staging.files false

spark.yarn.queue default

spark.yarn.scheduler.heartbeat.interval-ms 5000

spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService

spark.yarn.submit.file.replication 3

 

History Server 

*   Timeline Service Location:
http://has-dal-0002.corp.wayport.net:8188/
*   Last Updated: Feb 18, 2016 10:09:12 PM UTC
*   Service Started: Feb 5, 2016 12:37:15 AM UTC
*   Current Time: Feb 18, 2016 10:10:46 PM UTC
*   Timeline Service: Timeline service is enabled
*   History Provider: Apache Hadoop YARN Timeline Service

 



RE: Spark History Server NOT showing Jobs with Hortonworks

2016-02-18 Thread Sutanu Das
Hi Mich, Community - Do I need to specify it in the properties file in my 
spark-submit ?

From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: Thursday, February 18, 2016 4:28 PM
To: Sutanu Das; user@spark.apache.org
Subject: RE: Spark History Server NOT showing Jobs with Hortonworks

The jobs are normally shown under :4040/jobs/ in a normal set up not 
using any vendor's flavoiur

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Technology Ltd, its subsidiaries nor their employees 
accept any responsibility.


From: Sutanu Das [mailto:sd2...@att.com]
Sent: 18 February 2016 22:22
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark History Server NOT showing Jobs with Hortonworks

Hi Community,

Challenged with Spark issues with Hortonworks  (HDP 2.3.2_Spark 1.4.1) - The 
Spark History Server is NOT showing the Spark Running Jobs in Local Mode

The local-host:4040/app/v1 is ALSO not working

How can I look at my local Spark job?


# Generated by Apache Ambari. Fri Feb  5 00:37:06 2016

spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.yarn.history.YarnHistoryProvider
spark.history.ui.port 18080
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 2048
spark.yarn.executor.memoryOverhead 2048
spark.yarn.historyServer.address has-dal-0001.corp.wayport.net:18080
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.yarn.submit.file.replication 3

History Server

  *   Timeline Service Location: http://has-dal-0002.corp.wayport.net:8188/
  *   Last Updated: Feb 18, 2016 10:09:12 PM UTC
  *   Service Started: Feb 5, 2016 12:37:15 AM UTC
  *   Current Time: Feb 18, 2016 10:10:46 PM UTC
  *   Timeline Service: Timeline service is enabled
  *   History Provider: Apache Hadoop YARN Timeline Service



RE: Spark History Server NOT showing Jobs with Hortonworks

2016-02-18 Thread Mich Talebzadeh
The jobs are normally shown under :4040/jobs/ in a normal set up
not using any vendor's flavoiur

 

Dr Mich Talebzadeh

 

LinkedIn
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABU
rV8Pw>
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Technology Ltd, its subsidiaries nor their
employees accept any responsibility.

 

 

From: Sutanu Das [mailto:sd2...@att.com] 
Sent: 18 February 2016 22:22
To: user@spark.apache.org
Subject: Spark History Server NOT showing Jobs with Hortonworks

 

Hi Community,

 

Challenged with Spark issues with Hortonworks  (HDP 2.3.2_Spark 1.4.1) - The
Spark History Server is NOT showing the Spark Running Jobs in Local Mode 

 

The local-host:4040/app/v1 is ALSO not working

 

How can I look at my local Spark job?

 

 

# Generated by Apache Ambari. Fri Feb  5 00:37:06 2016



spark.history.kerberos.keytab none

spark.history.kerberos.principal none

spark.history.provider
org.apache.spark.deploy.yarn.history.YarnHistoryProvider

spark.history.ui.port 18080

spark.yarn.containerLauncherMaxThreads 25

spark.yarn.driver.memoryOverhead 2048

spark.yarn.executor.memoryOverhead 2048

spark.yarn.historyServer.address has-dal-0001.corp.wayport.net:18080

spark.yarn.max.executor.failures 3

spark.yarn.preserve.staging.files false

spark.yarn.queue default

spark.yarn.scheduler.heartbeat.interval-ms 5000

spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService

spark.yarn.submit.file.replication 3

 

History Server 

*   Timeline Service Location:
http://has-dal-0002.corp.wayport.net:8188/
*   Last Updated: Feb 18, 2016 10:09:12 PM UTC
*   Service Started: Feb 5, 2016 12:37:15 AM UTC
*   Current Time: Feb 18, 2016 10:10:46 PM UTC
*   Timeline Service: Timeline service is enabled
*   History Provider: Apache Hadoop YARN Timeline Service

 



Spark History Server NOT showing Jobs with Hortonworks

2016-02-18 Thread Sutanu Das
Hi Community,

Challenged with Spark issues with Hortonworks  (HDP 2.3.2_Spark 1.4.1) - The 
Spark History Server is NOT showing the Spark Running Jobs in Local Mode

The local-host:4040/app/v1 is ALSO not working

How can I look at my local Spark job?


# Generated by Apache Ambari. Fri Feb  5 00:37:06 2016

spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.yarn.history.YarnHistoryProvider
spark.history.ui.port 18080
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 2048
spark.yarn.executor.memoryOverhead 2048
spark.yarn.historyServer.address has-dal-0001.corp.wayport.net:18080
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.yarn.submit.file.replication 3

History Server

  *   Timeline Service Location: http://has-dal-0002.corp.wayport.net:8188/
  *   Last Updated: Feb 18, 2016 10:09:12 PM UTC
  *   Service Started: Feb 5, 2016 12:37:15 AM UTC
  *   Current Time: Feb 18, 2016 10:10:46 PM UTC
  *   Timeline Service: Timeline service is enabled
  *   History Provider: Apache Hadoop YARN Timeline Service



Re: pyspark - spark history server

2016-02-05 Thread cs user
Hi Folks,

So the fix for me was to copy this file on the nodes built with Ambari:

/usr/hdp/2.3.4.0-3485/spark/lib/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

To this file on the client machine, external to the cluster:

/opt/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar

I tried this after reading:

https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3ccaaonq7v7cq4hqr2p9ez5ojucmyc+mo2ggh068cwh+qwt6sx...@mail.gmail.com%3E

So I assume this is a custom built jar which is not part of the official
distribution. Just wanted to post in case this helps another person.

Thanks!


On Fri, Feb 5, 2016 at 2:08 PM, cs user <acldstk...@gmail.com> wrote:

> Hi All,
>
> I'm having trouble getting a job to use the spark history server. We have
> a cluster configured with Ambari, if I run the job from one of the nodes
> within the Ambari configured cluster, everything works fine, the job
> appears in the spark history server.
>
> If I configure a client external to the cluster, running the same job, the
> history server is not used.
>
> When the job completes fine, I see these lines appear in the log:
>
>
> 16/02/05 11:57:22 INFO history.YarnHistoryService: Starting
> YarnHistoryService for application application_1453893909110_0108 attempt
> Some(appattempt_1453893909110_0108_01); state=1; endpoint=
> http://somehost:8188/ws/v1/timeline/; bonded to ATS=false;
> listening=false; batchSize=10; flush count=0; total number queued=0,
> processed=0; attempted entity posts=0 successful entity posts=0 failed
> entity posts=0; events dropped=0; app start event received=false; app end
> event received=false;
> 16/02/05 11:57:22 INFO history.YarnHistoryService: Spark events will be
> published to the Timeline service at http://somehost:8188/ws/v1/timeline/
>
>
> On the client which is external to the cluster, these lines do not appear
> in the logs. I have printed out spark context and attempted to match what
> is configured on the working job, with the failing job, all seems fine.
>
> These are the job settings:
>
> conf.set('spark.speculation','true')
> conf.set('spark.dynamicAllocation.enabled','false')
> conf.set('spark.shuffle.service.enabled','false')
> conf.set('spark.executor.instances', '4')
> conf.set('spark.akka.threads','4')
> conf.set('spark.dynamicAllocation.initialExecutors','4')
>
> conf.set('spark.history.provider','org.apache.spark.deploy.yarn.history.YarnHistoryProvider')
>
> conf.set('spark.yarn.services','org.apache.spark.deploy.yarn.history.YarnHistoryService')
> conf.set('spark.history.ui.port','18080')
> conf.set('spark.driver.extraJavaOptions','-Dhdp.version=2.3.4.0-3485')
> conf.set('spark.yarn.containerLauncherMaxThreads','25')
> conf.set('spark.yarn.driver.memoryOverhead','384')
> conf.set('spark.yarn.executor.memoryOverhead','384')
> conf.set('spark.yarn.historyServer.address','somehost:18080')
> conf.set('spark.yarn.max.executor.failures','3')
> conf.set('spark.yarn.preserve.staging.files','false')
> conf.set('spark.yarn.queue','default')
> conf.set('spark.yarn.scheduler.heartbeat.interval-ms','5000')
> conf.set('spark.yarn.submit.file.replication','3')
> conf.set('spark.yarn.am.extraJavaOptions','-Dhdp.version=2.3.4.0-3485')
> conf.set('spark.blockManager.port','9096')
> conf.set('spark.driver.port','9095')
> conf.set('spark.fileserver.port','9097')
>
> I am using the following tar.gz file to install spark on the node external
> to the cluster:
>
>
> http://www.apache.org/dyn/closer.lua/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz
>
> Will this version of spark have everything required to talk correctly to
> yarn and the spark history service?
>
> So it comes down to, the spark context settings appear to be exactly the
> same, there are no errors in the logs pointing to the job not being able to
> connect to anything, none of the ports are blocked, why is this not working
> when run external to the cluster?
>
> There is no kerberos security configured on the cluster.
>
> Thanks!
>
>
>
>
>
>
>
>


pyspark - spark history server

2016-02-05 Thread cs user
Hi All,

I'm having trouble getting a job to use the spark history server. We have a
cluster configured with Ambari, if I run the job from one of the nodes
within the Ambari configured cluster, everything works fine, the job
appears in the spark history server.

If I configure a client external to the cluster, running the same job, the
history server is not used.

When the job completes fine, I see these lines appear in the log:


16/02/05 11:57:22 INFO history.YarnHistoryService: Starting
YarnHistoryService for application application_1453893909110_0108 attempt
Some(appattempt_1453893909110_0108_01); state=1; endpoint=
http://somehost:8188/ws/v1/timeline/; bonded to ATS=false; listening=false;
batchSize=10; flush count=0; total number queued=0, processed=0; attempted
entity posts=0 successful entity posts=0 failed entity posts=0; events
dropped=0; app start event received=false; app end event received=false;
16/02/05 11:57:22 INFO history.YarnHistoryService: Spark events will be
published to the Timeline service at http://somehost:8188/ws/v1/timeline/


On the client which is external to the cluster, these lines do not appear
in the logs. I have printed out spark context and attempted to match what
is configured on the working job, with the failing job, all seems fine.

These are the job settings:

conf.set('spark.speculation','true')
conf.set('spark.dynamicAllocation.enabled','false')
conf.set('spark.shuffle.service.enabled','false')
conf.set('spark.executor.instances', '4')
conf.set('spark.akka.threads','4')
conf.set('spark.dynamicAllocation.initialExecutors','4')
conf.set('spark.history.provider','org.apache.spark.deploy.yarn.history.YarnHistoryProvider')
conf.set('spark.yarn.services','org.apache.spark.deploy.yarn.history.YarnHistoryService')
conf.set('spark.history.ui.port','18080')
conf.set('spark.driver.extraJavaOptions','-Dhdp.version=2.3.4.0-3485')
conf.set('spark.yarn.containerLauncherMaxThreads','25')
conf.set('spark.yarn.driver.memoryOverhead','384')
conf.set('spark.yarn.executor.memoryOverhead','384')
conf.set('spark.yarn.historyServer.address','somehost:18080')
conf.set('spark.yarn.max.executor.failures','3')
conf.set('spark.yarn.preserve.staging.files','false')
conf.set('spark.yarn.queue','default')
conf.set('spark.yarn.scheduler.heartbeat.interval-ms','5000')
conf.set('spark.yarn.submit.file.replication','3')
conf.set('spark.yarn.am.extraJavaOptions','-Dhdp.version=2.3.4.0-3485')
conf.set('spark.blockManager.port','9096')
conf.set('spark.driver.port','9095')
conf.set('spark.fileserver.port','9097')

I am using the following tar.gz file to install spark on the node external
to the cluster:

http://www.apache.org/dyn/closer.lua/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz

Will this version of spark have everything required to talk correctly to
yarn and the spark history service?

So it comes down to, the spark context settings appear to be exactly the
same, there are no errors in the logs pointing to the job not being able to
connect to anything, none of the ports are blocked, why is this not working
when run external to the cluster?

There is no kerberos security configured on the cluster.

Thanks!


DAG visualization: no visualization information available with history server

2016-01-31 Thread Raghava
Hello All,

I am running the history server for a completed application. This
application was run with the following parameters

bin/spark-submit --class  --master local[2] --conf
spark.local.dir=/mnt/ --conf spark.eventLog.dir=/mnt/sparklog/ --conf
spark.eventLog.enabled=true --conf spark.ui.retainedJobs=1 --conf
spark.ui.retainedStages=1 

In the Spark Web UI (http://localhost:18080/), the DAG visualization of only
the most recent job is available. For rest of the jobs, I get the following
message 

No visualization information available for this job!
If this is an old job, its visualization metadata may have been cleaned up
over time.
You may consider increasing the value of spark.ui.retainedJobs and
spark.ui.retainedStages.

I did increase the retainedJobs and retainedStages value to 10,000. What
else should be done to retain the visualization information of all the
jobs/stages?

Thanks in advance.

Raghava.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/DAG-visualization-no-visualization-information-available-with-history-server-tp26117.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Is Spark History Server supported for Mesos?

2015-12-10 Thread Steve Loughran

On 9 Dec 2015, at 22:01, Kelvin Chu 
<2dot7kel...@gmail.com<mailto:2dot7kel...@gmail.com>> wrote:

Spark on YARN can use History Server by setting the configuration 
spark.yarn.historyServer.address.


That's the stuff in SPARK-1537 which isn' actually built in yet.

But, I can't find similar config for Mesos. Is History Server supported by 
Spark on Mesos? Thanks.



you set up your jobs to log to HDFS and run the history server to poll and 
replay the logs that appear there.

see: http://spark.apache.org/docs/latest/monitoring.html




Is Spark History Server supported for Mesos?

2015-12-09 Thread Kelvin Chu
Spark on YARN can use History Server by setting the configuration
spark.yarn.historyServer.address. But, I can't find similar config for
Mesos. Is History Server supported by Spark on Mesos? Thanks.

Kelvin


spark history server + yarn log aggregation issue

2015-09-09 Thread michael.england
Hi,

I am running Spark-on-YARN on a secure cluster with yarn log aggregation set 
up. Once a job completes, when viewing stdout/stderr executor logs in the Spark 
history server UI it redirects me to the local nodemanager where a page appears 
for a second saying ‘Redirecting to log server….’ and then redirects me to the 
aggregated job history server log page. However, the aggregated job history 
page sends me to 
http://:/jobhistory<http://%3cjobhistoryserver%3e:%3cport%3e/jobhistory>..
 Instead of https://..., causing odd characters to appear. If you manually 
specify https:// in the URL, this works as expected, however the page 
automatically refreshes and causes this to go back to http again.

I have set the job.log.server.url property in yarn-site.xml to include https:


yarn.log.server.url
https://.domain.com:port/jobhistory/logs/
  

I know this isn’t a Spark issue specifically, but I wondered if anyone else has 
experienced this issue and knows how to get around it?

Thanks,
Mike


This e-mail (including any attachments) is private and confidential, may 
contain proprietary or privileged information and is intended for the named 
recipient(s) only. Unintended recipients are strictly prohibited from taking 
action on the basis of information in this e-mail and must contact the sender 
immediately, delete this e-mail (and all attachments) and destroy any hard 
copies. Nomura will not accept responsibility or liability for the accuracy or 
completeness of, or the presence of any virus or disabling code in, this 
e-mail. If verification is sought please request a hard copy. Any reference to 
the terms of executed transactions should be treated as preliminary only and 
subject to formal written confirmation by Nomura. Nomura reserves the right to 
retain, monitor and intercept e-mail communications through its networks 
(subject to and in accordance with applicable laws). No confidentiality or 
privilege is waived or lost by Nomura by any mistransmission of this e-mail. 
Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, 
Inc. group. Please read our Electronic Communications Legal Notice which forms 
part of this e-mail: http://www.Nomura.com/email_disclaimer.htm



Re: History server is not receiving any event

2015-08-29 Thread Akhil Das
Are you starting your history server?

./sbin/start-history-server.sh

You can read more here
http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact



Thanks
Best Regards

On Tue, Aug 25, 2015 at 1:07 AM, b.bhavesh b.borisan...@gmail.com wrote:

 Hi,

 I am working on streaming application.
 I tried to configure history server to persist the events of application in
 hadoop file system (hdfs). However, it is not logging any events.
 I am running Apache Spark 1.4.1 (pyspark) under Ubuntu 14.04 with three
 nodes.
 Here is my configuration:
 File - /usr/local/spark/conf/spark-defaults.conf#In all three nodes
 spark.eventLog.enabled true
 spark.eventLog.dir hdfs://master-host:port/usr/local/hadoop/spark_log

 #in master node
 export

 SPARK_HISTORY_OPTS=-Dspark.history.fs.logDirectory=hdfs://host:port/usr/local/hadoop/spark_log

 Can someone give list of steps to configure history server.

 Thanks and regards,
 b.bhavesh





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/History-server-is-not-receiving-any-event-tp24426.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




History server is not receiving any event

2015-08-24 Thread b.bhavesh
Hi,

I am working on streaming application. 
I tried to configure history server to persist the events of application in
hadoop file system (hdfs). However, it is not logging any events.
I am running Apache Spark 1.4.1 (pyspark) under Ubuntu 14.04 with three
nodes.
Here is my configuration:
File - /usr/local/spark/conf/spark-defaults.conf#In all three nodes
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master-host:port/usr/local/hadoop/spark_log

#in master node
export
SPARK_HISTORY_OPTS=-Dspark.history.fs.logDirectory=hdfs://host:port/usr/local/hadoop/spark_log

Can someone give list of steps to configure history server.

Thanks and regards,
b.bhavesh





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/History-server-is-not-receiving-any-event-tp24426.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Web UI vs History Server Bugs

2015-06-23 Thread Evo Eftimov
Probably your application has crashed or was terminated without invoking the
stop method of spark context - in such cases it doesn't create the empty
flag file which apparently tells the history server that it can safely show
the log data - simpy go to some of the other dirs of the history server to
see what the name of the flag file was and then create it manually in the
dirs of the missing apps - then they will appear in the history server ui

 

From: Steve Loughran [mailto:ste...@hortonworks.com] 
Sent: Monday, June 22, 2015 7:22 PM
To: Jonathon Cai
Cc: user@spark.apache.org
Subject: Re: Web UI vs History Server Bugs

 

well, I'm afraid you've reached the limits of my knowledge ... hopefully
someone else can answer 

 

On 22 Jun 2015, at 16:37, Jonathon Cai jonathon@yale.edu wrote:

 

No, what I'm seeing is that while the cluster is running, I can't see the
app info after the app is completed. That is to say, when I click on the
application name on master:8080, no info is shown. However, when I examine
the same file on the History Server, the application information opens fine.

 

On Sat, Jun 20, 2015 at 6:47 AM, Steve Loughran ste...@hortonworks.com
wrote:


 On 17 Jun 2015, at 19:10, jcai jonathon@yale.edu wrote:

 Hi,

 I am running this on Spark stand-alone mode. I find that when I examine
the
 web UI, a couple bugs arise:

 1. There is a discrepancy between the number denoting the duration of the
 application when I run the history server and the number given by the web
UI
 (default address is master:8080). I checked more specific details,
including
 task and stage durations (when clicking on the application), and these
 appear to be the same for both avenues.

 2. Sometimes the web UI on master:8080 is unable to display more specific
 information for an application that has finished (when clicking on the
 application), even when there is a log file in the appropriate directory.
 But when the history server is opened, it is able to read this file and
 output information.


There's a JIRA open on the history server caching incomplete work...if you
click on the link to a job while it's in progress, you don't get any updates
later.

does this sound like what you are seeing?

 

 



Re: Web UI vs History Server Bugs

2015-06-22 Thread Jonathon Cai
No, what I'm seeing is that while the cluster is running, I can't see the
app info after the app is completed. That is to say, when I click on the
application name on master:8080, no info is shown. However, when I examine
the same file on the History Server, the application information opens fine.

On Sat, Jun 20, 2015 at 6:47 AM, Steve Loughran ste...@hortonworks.com
wrote:


  On 17 Jun 2015, at 19:10, jcai jonathon@yale.edu wrote:
 
  Hi,
 
  I am running this on Spark stand-alone mode. I find that when I examine
 the
  web UI, a couple bugs arise:
 
  1. There is a discrepancy between the number denoting the duration of the
  application when I run the history server and the number given by the
 web UI
  (default address is master:8080). I checked more specific details,
 including
  task and stage durations (when clicking on the application), and these
  appear to be the same for both avenues.
 
  2. Sometimes the web UI on master:8080 is unable to display more specific
  information for an application that has finished (when clicking on the
  application), even when there is a log file in the appropriate directory.
  But when the history server is opened, it is able to read this file and
  output information.
 

 There's a JIRA open on the history server caching incomplete work...if you
 click on the link to a job while it's in progress, you don't get any
 updates later.

 does this sound like what you are seeing?




Re: Spark 1.4 History Server - HDP 2.2

2015-06-21 Thread Steve Loughran

 On 20 Jun 2015, at 17:37, Ashish Soni asoni.le...@gmail.com wrote:
 
 Can any one help i am getting below error when i try to start the History 
 Server 
 I do not see any org.apache.spark.deploy.yarn.history.pakage inside the 
 assembly jar not sure how to get that 
 
 java.lang.ClassNotFoundException: 
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider
 
 
 Thanks,
 Ashish

That new class comes when the SPARK-1537 patch comes in...it's getting close. 
What's probably happening is that you've got a configuration with 
spark.history.provider set to that class, but not the implementation in your 
JAR. Unset that property in your spark configuration and all should be well.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Web UI vs History Server Bugs

2015-06-20 Thread Steve Loughran

 On 17 Jun 2015, at 19:10, jcai jonathon@yale.edu wrote:
 
 Hi,
 
 I am running this on Spark stand-alone mode. I find that when I examine the
 web UI, a couple bugs arise:
 
 1. There is a discrepancy between the number denoting the duration of the
 application when I run the history server and the number given by the web UI
 (default address is master:8080). I checked more specific details, including
 task and stage durations (when clicking on the application), and these
 appear to be the same for both avenues.
 
 2. Sometimes the web UI on master:8080 is unable to display more specific
 information for an application that has finished (when clicking on the
 application), even when there is a log file in the appropriate directory.
 But when the history server is opened, it is able to read this file and
 output information. 
 

There's a JIRA open on the history server caching incomplete work...if you 
click on the link to a job while it's in progress, you don't get any updates 
later. 

does this sound like what you are seeing?


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.4 History Server - HDP 2.2

2015-06-20 Thread Ashish Soni
Can any one help i am getting below error when i try to start the History
Server
I do not see any org.apache.spark.deploy.yarn.history.pakage inside the
assembly jar not sure how to get that

java.lang.ClassNotFoundException:
org.apache.spark.deploy.yarn.history.YarnHistoryProvider


Thanks,
Ashish


Re: Web UI vs History Server Bugs

2015-06-18 Thread Akhil Das
You could possibly open up a JIRA and shoot an email to the dev list.

Thanks
Best Regards

On Wed, Jun 17, 2015 at 11:40 PM, jcai jonathon@yale.edu wrote:

 Hi,

 I am running this on Spark stand-alone mode. I find that when I examine the
 web UI, a couple bugs arise:

 1. There is a discrepancy between the number denoting the duration of the
 application when I run the history server and the number given by the web
 UI
 (default address is master:8080). I checked more specific details,
 including
 task and stage durations (when clicking on the application), and these
 appear to be the same for both avenues.

 2. Sometimes the web UI on master:8080 is unable to display more specific
 information for an application that has finished (when clicking on the
 application), even when there is a log file in the appropriate directory.
 But when the history server is opened, it is able to read this file and
 output information.

 Any ideas on how to approach these?

 I am trying to do accurate performance measurements on Spark workloads. I
 believe these might be bugs.

 Thanks,

 Jonathon



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Web-UI-vs-History-Server-Bugs-tp23371.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Web UI vs History Server Bugs

2015-06-17 Thread jcai
Hi,

I am running this on Spark stand-alone mode. I find that when I examine the
web UI, a couple bugs arise:

1. There is a discrepancy between the number denoting the duration of the
application when I run the history server and the number given by the web UI
(default address is master:8080). I checked more specific details, including
task and stage durations (when clicking on the application), and these
appear to be the same for both avenues.

2. Sometimes the web UI on master:8080 is unable to display more specific
information for an application that has finished (when clicking on the
application), even when there is a log file in the appropriate directory.
But when the history server is opened, it is able to read this file and
output information. 

Any ideas on how to approach these?

I am trying to do accurate performance measurements on Spark workloads. I
believe these might be bugs.

Thanks,

Jonathon



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Web-UI-vs-History-Server-Bugs-tp23371.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark History Server pointing to S3

2015-06-16 Thread Akhil Das
Not quiet sure, but try pointing the spark.history.fs.logDirectory to your
s3

Thanks
Best Regards

On Tue, Jun 16, 2015 at 6:26 PM, Gianluca Privitera 
gianluca.privite...@studio.unibo.it wrote:

 In Spark website it’s stated in the View After the Fact section (
 https://spark.apache.org/docs/latest/monitoring.html) that you can point
 the start-history-server.sh script to a directory in order do view the Web
 UI using the logs as data source.

 Is it possible to point that script to S3? Maybe from a EC2 instance?

 Thanks,

 Gianluca
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Spark History Server pointing to S3

2015-06-16 Thread Gianluca Privitera
In Spark website it’s stated in the View After the Fact section 
(https://spark.apache.org/docs/latest/monitoring.html) that you can point the 
start-history-server.sh script to a directory in order do view the Web UI using 
the logs as data source.

Is it possible to point that script to S3? Maybe from a EC2 instance? 

Thanks,

Gianluca
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark History Server pointing to S3

2015-06-16 Thread Gianluca Privitera
It gives me an exception with org.apache.spark.deploy.history.FsHistoryProvider 
, a problem with the file system. I can reproduce the exception if you want.
It perfectly works if I give a local path, I tested it in 1.3.0 version.

Gianluca

On 16 Jun 2015, at 15:08, Akhil Das 
ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com wrote:

Not quiet sure, but try pointing the spark.history.fs.logDirectory to your s3

Thanks
Best Regards

On Tue, Jun 16, 2015 at 6:26 PM, Gianluca Privitera 
gianluca.privite...@studio.unibo.itmailto:gianluca.privite...@studio.unibo.it
 wrote:
In Spark website it’s stated in the View After the Fact section 
(https://spark.apache.org/docs/latest/monitoring.html) that you can point the 
start-history-server.sh script to a directory in order do view the Web UI using 
the logs as data source.

Is it possible to point that script to S3? Maybe from a EC2 instance?

Thanks,

Gianluca
-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org





Re: spark eventLog and history server

2015-06-09 Thread Richard Marscher
Hi,

I don't have a complete answer to your questions but:

Removing the suffix does not solve the problem - unfortunately this is
true, the master web UI only tries to build out a Spark UI from the event
logs once, at the time the context is closed. If the event logs are
in-progress at this time, then you basically missed the opportunity.

Does it mean I don't need to start history server if I only use spark in
standalone mode? - Yes, you don't need to start the history server.

On Mon, Jun 8, 2015 at 7:57 PM, Du Li l...@yahoo-inc.com.invalid wrote:

 Event log is enabled in my spark streaming app. My code runs in standalone
 mode and the spark version is 1.3.1. I periodically stop and restart the
 streaming context by calling ssc.stop(). However, from the web UI, when
 clicking on a past job, it says the job is still in progress and does not
 show the event log. The event log files have suffix .inprogress. Removing
 the suffix does not solve the problem. Do I need to do anything here in
 order to view the event logs of finished jobs? Or do I need to stop ssc
 differently?

 In addition, the documentation seems to suggest history server is used for
 Mesos or YARN mode. Does it mean I don't need to start history server if I
 only use spark in standalone mode?

 Thanks,
 Du



spark eventLog and history server

2015-06-08 Thread Du Li
Event log is enabled in my spark streaming app. My code runs in standalone mode 
and the spark version is 1.3.1. I periodically stop and restart the streaming 
context by calling ssc.stop(). However, from the web UI, when clicking on a 
past job, it says the job is still in progress and does not show the event log. 
The event log files have suffix .inprogress. Removing the suffix does not solve 
the problem. Do I need to do anything here in order to view the event logs of 
finished jobs? Or do I need to stop ssc differently?
In addition, the documentation seems to suggest history server is used for 
Mesos or YARN mode. Does it mean I don't need to start history server if I only 
use spark in standalone mode?
Thanks,Du

Re: View all user's application logs in history server

2015-05-27 Thread Jianshi Huang
No one using History server? :)

Am I the only one need to see all user's logs?

Jianshi

On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:

 Hi,

 I'm using Spark 1.4.0-rc1 and I'm using default settings for history
 server.

 But I can only see my own logs. Is it possible to view all user's logs?
 The permission is fine for the user group.

 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/




-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/


Re: View all user's application logs in history server

2015-05-27 Thread Marcelo Vanzin
You may be the only one not seeing all the logs. Are you sure all the users
are writing to the same log directory? The HS can only read from a single
log directory.

On Wed, May 27, 2015 at 5:33 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:

 No one using History server? :)

 Am I the only one need to see all user's logs?

 Jianshi

 On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:

 Hi,

 I'm using Spark 1.4.0-rc1 and I'm using default settings for history
 server.

 But I can only see my own logs. Is it possible to view all user's logs?
 The permission is fine for the user group.

 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/




 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/




-- 
Marcelo


Re: View all user's application logs in history server

2015-05-27 Thread Jianshi Huang
Yes, all written to the same directory on HDFS.

Jianshi

On Wed, May 27, 2015 at 11:57 PM, Marcelo Vanzin van...@cloudera.com
wrote:

 You may be the only one not seeing all the logs. Are you sure all the
 users are writing to the same log directory? The HS can only read from a
 single log directory.

 On Wed, May 27, 2015 at 5:33 AM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:

 No one using History server? :)

 Am I the only one need to see all user's logs?

 Jianshi

 On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:

 Hi,

 I'm using Spark 1.4.0-rc1 and I'm using default settings for history
 server.

 But I can only see my own logs. Is it possible to view all user's logs?
 The permission is fine for the user group.

 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/




 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/




 --
 Marcelo




-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/


Re: View all user's application logs in history server

2015-05-27 Thread Marcelo Vanzin
Then:
- Are all files readable by the user running the history server?
- Did all applications call sc.stop() correctly (i.e. files do not have the
.inprogress suffix)?

Other than that, always look at the logs first, looking for any errors that
may be thrown.


On Wed, May 27, 2015 at 9:10 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:

 Yes, all written to the same directory on HDFS.

 Jianshi

 On Wed, May 27, 2015 at 11:57 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 You may be the only one not seeing all the logs. Are you sure all the
 users are writing to the same log directory? The HS can only read from a
 single log directory.

 On Wed, May 27, 2015 at 5:33 AM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:

 No one using History server? :)

 Am I the only one need to see all user's logs?

 Jianshi

 On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang jianshi.hu...@gmail.com
 wrote:

 Hi,

 I'm using Spark 1.4.0-rc1 and I'm using default settings for history
 server.

 But I can only see my own logs. Is it possible to view all user's logs?
 The permission is fine for the user group.

 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/




 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/




 --
 Marcelo




 --
 Jianshi Huang

 LinkedIn: jianshi
 Twitter: @jshuang
 Github  Blog: http://huangjs.github.com/




-- 
Marcelo


View all user's application logs in history server

2015-05-20 Thread Jianshi Huang
Hi,

I'm using Spark 1.4.0-rc1 and I'm using default settings for history server.

But I can only see my own logs. Is it possible to view all user's logs? The
permission is fine for the user group.

-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/


history server

2015-05-07 Thread Koert Kuipers
i am trying to launch the spark 1.3.1 history server on a secure cluster.

i can see in the logs that it successfully logs into kerberos, and it is
replaying all the logs, but i never see the log message that indicate the
web server is started (i should see something like Successfully started
service on port 18080. or Started HistoryServer at http://somehost:18080;).
yet the daemon stays alive...

any idea why the history server would never start the web service?

thanks!


Re: history server

2015-05-07 Thread Shixiong Zhu
The history server may need several hours to start if you have a lot of
event logs. Is it stuck, or still replaying logs?

Best Regards,
Shixiong Zhu

2015-05-07 11:03 GMT-07:00 Marcelo Vanzin van...@cloudera.com:

 Can you get a jstack for the process? Maybe it's stuck somewhere.

 On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com wrote:

 i am trying to launch the spark 1.3.1 history server on a secure cluster.

 i can see in the logs that it successfully logs into kerberos, and it is
 replaying all the logs, but i never see the log message that indicate the
 web server is started (i should see something like Successfully started
 service on port 18080. or Started HistoryServer at
 http://somehost:18080;). yet the daemon stays alive...

 any idea why the history server would never start the web service?

 thanks!




 --
 Marcelo



Re: history server

2015-05-07 Thread Shixiong Zhu
SPARK-5522 is really cool. Didn't notice it.

Best Regards,
Shixiong Zhu

2015-05-07 11:36 GMT-07:00 Marcelo Vanzin van...@cloudera.com:

 That shouldn't be true in 1.3 (see SPARK-5522).

 On Thu, May 7, 2015 at 11:33 AM, Shixiong Zhu zsxw...@gmail.com wrote:

 The history server may need several hours to start if you have a lot of
 event logs. Is it stuck, or still replaying logs?

 Best Regards,
 Shixiong Zhu

 2015-05-07 11:03 GMT-07:00 Marcelo Vanzin van...@cloudera.com:

 Can you get a jstack for the process? Maybe it's stuck somewhere.

 On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com
 wrote:

 i am trying to launch the spark 1.3.1 history server on a secure
 cluster.

 i can see in the logs that it successfully logs into kerberos, and it
 is replaying all the logs, but i never see the log message that indicate
 the web server is started (i should see something like Successfully
 started service on port 18080. or Started HistoryServer at
 http://somehost:18080;). yet the daemon stays alive...

 any idea why the history server would never start the web service?

 thanks!




 --
 Marcelo





 --
 Marcelo



Re: history server

2015-05-07 Thread Marcelo Vanzin
That shouldn't be true in 1.3 (see SPARK-5522).

On Thu, May 7, 2015 at 11:33 AM, Shixiong Zhu zsxw...@gmail.com wrote:

 The history server may need several hours to start if you have a lot of
 event logs. Is it stuck, or still replaying logs?

 Best Regards,
 Shixiong Zhu

 2015-05-07 11:03 GMT-07:00 Marcelo Vanzin van...@cloudera.com:

 Can you get a jstack for the process? Maybe it's stuck somewhere.

 On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com wrote:

 i am trying to launch the spark 1.3.1 history server on a secure cluster.

 i can see in the logs that it successfully logs into kerberos, and it is
 replaying all the logs, but i never see the log message that indicate the
 web server is started (i should see something like Successfully started
 service on port 18080. or Started HistoryServer at
 http://somehost:18080;). yet the daemon stays alive...

 any idea why the history server would never start the web service?

 thanks!




 --
 Marcelo





-- 
Marcelo


Re: history server

2015-05-07 Thread Koert Kuipers
,
line=1177 (Interpreted frame)
 - org.apache.spark.scheduler.ReplayListenerBus.replay(java.io.InputStream,
java.lang.String) @bci=42, line=49 (Interpreted frame)
 - 
org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(org.apache.hadoop.fs.FileStatus,
org.apache.spark.scheduler.ReplayListenerBus) @bci=69, line=260
(Interpreted frame)
 -
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$6.apply(org.apache.hadoop.fs.FileStatus)
@bci=19, line=190 (Interpreted frame)
 -
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$6.apply(java.lang.Object)
@bci=5, line=188 (Interpreted frame)
 -
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(java.lang.Object)
@bci=9, line=252 (Interpreted frame)
 -
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(java.lang.Object)
@bci=2, line=252 (Interpreted frame)
 -
scala.collection.IndexedSeqOptimized$class.foreach(scala.collection.IndexedSeqOptimized,
scala.Function1) @bci=22, line=33 (Interpreted frame)
 - scala.collection.mutable.WrappedArray.foreach(scala.Function1) @bci=2,
line=35 (Interpreted frame)
 -
scala.collection.TraversableLike$class.flatMap(scala.collection.TraversableLike,
scala.Function1, scala.collection.generic.CanBuildFrom) @bci=17, line=252
(Interpreted frame)
 - scala.collection.AbstractTraversable.flatMap(scala.Function1,
scala.collection.generic.CanBuildFrom) @bci=3, line=104 (Interpreted frame)
 - org.apache.spark.deploy.history.FsHistoryProvider.checkForLogs()
@bci=110, line=188 (Interpreted frame)
 - org.apache.spark.deploy.history.FsHistoryProvider.initialize() @bci=38,
line=116 (Interpreted frame)
 -
org.apache.spark.deploy.history.FsHistoryProvider.init(org.apache.spark.SparkConf)
@bci=214, line=99 (Interpreted frame)
 -
sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor,
java.lang.Object[]) @bci=0 (Interpreted frame)
 -
sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[])
@bci=72, line=57 (Interpreted frame)
 -
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[])
@bci=5, line=45 (Interpreted frame)
 - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) @bci=79,
line=526 (Interpreted frame)
 - org.apache.spark.deploy.history.HistoryServer$.main(java.lang.String[])
@bci=89, line=185 (Interpreted frame)
 - org.apache.spark.deploy.history.HistoryServer.main(java.lang.String[])
@bci=4 (Interpreted frame)



On Thu, May 7, 2015 at 2:17 PM, Koert Kuipers ko...@tresata.com wrote:

 good idea i will take a look. it does seem to be spinning one cpu at
 100%...

 On Thu, May 7, 2015 at 2:03 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Can you get a jstack for the process? Maybe it's stuck somewhere.

 On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com wrote:

 i am trying to launch the spark 1.3.1 history server on a secure cluster.

 i can see in the logs that it successfully logs into kerberos, and it is
 replaying all the logs, but i never see the log message that indicate the
 web server is started (i should see something like Successfully started
 service on port 18080. or Started HistoryServer at
 http://somehost:18080;). yet the daemon stays alive...

 any idea why the history server would never start the web service?

 thanks!




 --
 Marcelo





Re: history server

2015-05-07 Thread Koert Kuipers
.apply(java.lang.String)
 @bci=34, line=51 (Compiled frame)
  -
 org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$1.apply(java.lang.Object)
 @bci=5, line=49 (Compiled frame)
  - scala.collection.Iterator$class.foreach(scala.collection.Iterator,
 scala.Function1) @bci=16, line=743 (Compiled frame)
  - scala.collection.AbstractIterator.foreach(scala.Function1) @bci=2,
 line=1177 (Interpreted frame)
  -
 org.apache.spark.scheduler.ReplayListenerBus.replay(java.io.InputStream,
 java.lang.String) @bci=42, line=49 (Interpreted frame)
  - 
 org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(org.apache.hadoop.fs.FileStatus,
 org.apache.spark.scheduler.ReplayListenerBus) @bci=69, line=260
 (Interpreted frame)
  -
 org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$6.apply(org.apache.hadoop.fs.FileStatus)
 @bci=19, line=190 (Interpreted frame)
  -
 org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$6.apply(java.lang.Object)
 @bci=5, line=188 (Interpreted frame)
  -
 scala.collection.TraversableLike$$anonfun$flatMap$1.apply(java.lang.Object)
 @bci=9, line=252 (Interpreted frame)
  -
 scala.collection.TraversableLike$$anonfun$flatMap$1.apply(java.lang.Object)
 @bci=2, line=252 (Interpreted frame)
  -
 scala.collection.IndexedSeqOptimized$class.foreach(scala.collection.IndexedSeqOptimized,
 scala.Function1) @bci=22, line=33 (Interpreted frame)
  - scala.collection.mutable.WrappedArray.foreach(scala.Function1) @bci=2,
 line=35 (Interpreted frame)
  -
 scala.collection.TraversableLike$class.flatMap(scala.collection.TraversableLike,
 scala.Function1, scala.collection.generic.CanBuildFrom) @bci=17, line=252
 (Interpreted frame)
  - scala.collection.AbstractTraversable.flatMap(scala.Function1,
 scala.collection.generic.CanBuildFrom) @bci=3, line=104 (Interpreted frame)
  - org.apache.spark.deploy.history.FsHistoryProvider.checkForLogs()
 @bci=110, line=188 (Interpreted frame)
  - org.apache.spark.deploy.history.FsHistoryProvider.initialize()
 @bci=38, line=116 (Interpreted frame)
  -
 org.apache.spark.deploy.history.FsHistoryProvider.init(org.apache.spark.SparkConf)
 @bci=214, line=99 (Interpreted frame)
  -
 sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor,
 java.lang.Object[]) @bci=0 (Interpreted frame)
  -
 sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[])
 @bci=72, line=57 (Interpreted frame)
  -
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[])
 @bci=5, line=45 (Interpreted frame)
  - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) @bci=79,
 line=526 (Interpreted frame)
  -
 org.apache.spark.deploy.history.HistoryServer$.main(java.lang.String[])
 @bci=89, line=185 (Interpreted frame)
  - org.apache.spark.deploy.history.HistoryServer.main(java.lang.String[])
 @bci=4 (Interpreted frame)



 On Thu, May 7, 2015 at 2:17 PM, Koert Kuipers ko...@tresata.com wrote:

 good idea i will take a look. it does seem to be spinning one cpu at
 100%...

 On Thu, May 7, 2015 at 2:03 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Can you get a jstack for the process? Maybe it's stuck somewhere.

 On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com
 wrote:

 i am trying to launch the spark 1.3.1 history server on a secure
 cluster.

 i can see in the logs that it successfully logs into kerberos, and it
 is replaying all the logs, but i never see the log message that indicate
 the web server is started (i should see something like Successfully
 started service on port 18080. or Started HistoryServer at
 http://somehost:18080;). yet the daemon stays alive...

 any idea why the history server would never start the web service?

 thanks!




 --
 Marcelo






 --
 Marcelo



Re: history server

2015-05-07 Thread Ankur Chauhan
Hi,

Sorry this may be a little off topic but I tried searching for docs on history 
server but couldn't really find much. Can someone point me to a doc or give me 
a point of reference for the use and intent of a history server?


-- Ankur

 On 7 May 2015, at 12:06, Koert Kuipers ko...@tresata.com wrote:
 
 got it. thanks!
 
 On Thu, May 7, 2015 at 2:52 PM, Marcelo Vanzin van...@cloudera.com wrote:
 Ah, sorry, that's definitely what Shixiong mentioned. The patch I mentioned 
 did not make it into 1.3...
 
 On Thu, May 7, 2015 at 11:48 AM, Koert Kuipers ko...@tresata.com wrote:
 seems i got one thread spinning 100% for a while now, in 
 FsHistoryProvider.initialize(). maybe something wrong with my logs on hdfs 
 that its reading? or could it simply really take 30 mins to read all the 
 history on dhfs?
 
 jstack:
 
 Deadlock Detection:
 
 No deadlocks found.
 
 Thread 2272: (state = BLOCKED)
  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
 may be imprecise)
  - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) 
 @bci=20, line=226 (Compiled frame)
  - 
 java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
  boolean, long) @bci=174, line=460 (Compiled frame)
  - 
 java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
  boolean, long) @bci=102, line=359 (Interpreted frame)
  - java.util.concurrent.SynchronousQueue.poll(long, 
 java.util.concurrent.TimeUnit) @bci=11, line=942 (Interpreted frame)
  - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=141, line=1068 
 (Interpreted frame)
  - 
 java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
  @bci=26, line=1130 (Interpreted frame)
  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 
 (Interpreted frame)
  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
 
 
 Thread 1986: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - org.apache.hadoop.hdfs.PeerCache.run() @bci=41, line=250 (Interpreted 
 frame)
  - 
 org.apache.hadoop.hdfs.PeerCache.access$000(org.apache.hadoop.hdfs.PeerCache) 
 @bci=1, line=41 (Interpreted frame)
  - org.apache.hadoop.hdfs.PeerCache$1.run() @bci=4, line=119 (Interpreted 
 frame)
  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
 
 
 Thread 1970: (state = BLOCKED)
 
 
 Thread 1969: (state = BLOCKED)
  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
  - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=135 (Interpreted 
 frame)
  - java.lang.ref.ReferenceQueue.remove() @bci=2, line=151 (Interpreted frame)
  - java.lang.ref.Finalizer$FinalizerThread.run() @bci=36, line=209 
 (Interpreted frame)
 
 
 Thread 1968: (state = BLOCKED)
  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
  - java.lang.Object.wait() @bci=2, line=503 (Interpreted frame)
  - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=133 
 (Interpreted frame)
 
 
 Thread 1958: (state = IN_VM)
  - java.lang.Throwable.fillInStackTrace(int) @bci=0 (Compiled frame; 
 information may be imprecise)
  - java.lang.Throwable.fillInStackTrace() @bci=16, line=783 (Compiled frame)
  - java.lang.Throwable.init(java.lang.String, java.lang.Throwable) @bci=24, 
 line=287 (Compiled frame)
  - java.lang.Exception.init(java.lang.String, java.lang.Throwable) @bci=3, 
 line=84 (Compiled frame)
  - org.json4s.package$MappingException.init(java.lang.String, 
 java.lang.Exception) @bci=13, line=56 (Compiled frame)
  - org.json4s.reflect.package$.fail(java.lang.String, java.lang.Exception) 
 @bci=6, line=96 (Compiled frame)
  - org.json4s.Extraction$.convert(org.json4s.JsonAST$JValue, 
 org.json4s.reflect.ScalaType, org.json4s.Formats, scala.Option) @bci=2447, 
 line=554 (Compiled frame)
  - org.json4s.Extraction$.extract(org.json4s.JsonAST$JValue, 
 org.json4s.reflect.ScalaType, org.json4s.Formats) @bci=796, line=331 
 (Compiled frame)
  - org.json4s.Extraction$.extract(org.json4s.JsonAST$JValue, 
 org.json4s.Formats, scala.reflect.Manifest) @bci=10, line=42 (Compiled frame)
  - org.json4s.Extraction$.extractOpt(org.json4s.JsonAST$JValue, 
 org.json4s.Formats, scala.reflect.Manifest) @bci=7, line=54 (Compiled frame)
  - org.json4s.ExtractableJsonAstNode.extractOpt(org.json4s.Formats, 
 scala.reflect.Manifest) @bci=9, line=40 (Compiled frame)
  - 
 org.apache.spark.util.JsonProtocol$.shuffleWriteMetricsFromJson(org.json4s.JsonAST$JValue)
  @bci=116, line=702 (Compiled frame)
  - 
 org.apache.spark.util.JsonProtocol$$anonfun$taskMetricsFromJson$2.apply(org.json4s.JsonAST$JValue)
  @bci=4, line=670 (Compiled frame)
  - 
 org.apache.spark.util.JsonProtocol$$anonfun$taskMetricsFromJson$2.apply(java.lang.Object)
  @bci=5, line=670 (Compiled frame)
  - scala.Option.map(scala.Function1) @bci=22, line=145 (Compiled frame)
  - 
 org.apache.spark.util.JsonProtocol$.taskMetricsFromJson

Re: history server

2015-05-07 Thread Marcelo Vanzin
)
  -
 org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$1.apply(java.lang.Object)
 @bci=5, line=49 (Compiled frame)
  - scala.collection.Iterator$class.foreach(scala.collection.Iterator,
 scala.Function1) @bci=16, line=743 (Compiled frame)
  - scala.collection.AbstractIterator.foreach(scala.Function1) @bci=2,
 line=1177 (Interpreted frame)
  -
 org.apache.spark.scheduler.ReplayListenerBus.replay(java.io.InputStream,
 java.lang.String) @bci=42, line=49 (Interpreted frame)
  - 
 org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(org.apache.hadoop.fs.FileStatus,
 org.apache.spark.scheduler.ReplayListenerBus) @bci=69, line=260
 (Interpreted frame)
  -
 org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$6.apply(org.apache.hadoop.fs.FileStatus)
 @bci=19, line=190 (Interpreted frame)
  -
 org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$6.apply(java.lang.Object)
 @bci=5, line=188 (Interpreted frame)
  -
 scala.collection.TraversableLike$$anonfun$flatMap$1.apply(java.lang.Object)
 @bci=9, line=252 (Interpreted frame)
  -
 scala.collection.TraversableLike$$anonfun$flatMap$1.apply(java.lang.Object)
 @bci=2, line=252 (Interpreted frame)
  -
 scala.collection.IndexedSeqOptimized$class.foreach(scala.collection.IndexedSeqOptimized,
 scala.Function1) @bci=22, line=33 (Interpreted frame)
  - scala.collection.mutable.WrappedArray.foreach(scala.Function1) @bci=2,
 line=35 (Interpreted frame)
  -
 scala.collection.TraversableLike$class.flatMap(scala.collection.TraversableLike,
 scala.Function1, scala.collection.generic.CanBuildFrom) @bci=17, line=252
 (Interpreted frame)
  - scala.collection.AbstractTraversable.flatMap(scala.Function1,
 scala.collection.generic.CanBuildFrom) @bci=3, line=104 (Interpreted frame)
  - org.apache.spark.deploy.history.FsHistoryProvider.checkForLogs()
 @bci=110, line=188 (Interpreted frame)
  - org.apache.spark.deploy.history.FsHistoryProvider.initialize() @bci=38,
 line=116 (Interpreted frame)
  -
 org.apache.spark.deploy.history.FsHistoryProvider.init(org.apache.spark.SparkConf)
 @bci=214, line=99 (Interpreted frame)
  -
 sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor,
 java.lang.Object[]) @bci=0 (Interpreted frame)
  -
 sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[])
 @bci=72, line=57 (Interpreted frame)
  -
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[])
 @bci=5, line=45 (Interpreted frame)
  - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) @bci=79,
 line=526 (Interpreted frame)
  - org.apache.spark.deploy.history.HistoryServer$.main(java.lang.String[])
 @bci=89, line=185 (Interpreted frame)
  - org.apache.spark.deploy.history.HistoryServer.main(java.lang.String[])
 @bci=4 (Interpreted frame)



 On Thu, May 7, 2015 at 2:17 PM, Koert Kuipers ko...@tresata.com wrote:

 good idea i will take a look. it does seem to be spinning one cpu at
 100%...

 On Thu, May 7, 2015 at 2:03 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Can you get a jstack for the process? Maybe it's stuck somewhere.

 On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com
 wrote:

 i am trying to launch the spark 1.3.1 history server on a secure
 cluster.

 i can see in the logs that it successfully logs into kerberos, and it
 is replaying all the logs, but i never see the log message that indicate
 the web server is started (i should see something like Successfully
 started service on port 18080. or Started HistoryServer at
 http://somehost:18080;). yet the daemon stays alive...

 any idea why the history server would never start the web service?

 thanks!




 --
 Marcelo






-- 
Marcelo


Re: history server

2015-05-07 Thread Marcelo Vanzin
Can you get a jstack for the process? Maybe it's stuck somewhere.

On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com wrote:

 i am trying to launch the spark 1.3.1 history server on a secure cluster.

 i can see in the logs that it successfully logs into kerberos, and it is
 replaying all the logs, but i never see the log message that indicate the
 web server is started (i should see something like Successfully started
 service on port 18080. or Started HistoryServer at http://somehost:18080;).
 yet the daemon stays alive...

 any idea why the history server would never start the web service?

 thanks!




-- 
Marcelo


Re: history-server does't read logs which are on FS

2015-04-20 Thread Serega Sheypak
Thanks, it helped.
We can't use Spark 1.3 because Cassandra DSE doesn't support it.

2015-04-17 21:48 GMT+02:00 Imran Rashid iras...@cloudera.com:

 are you calling sc.stop() at the end of your applications?

 The history server only displays completed applications, but if you don't
 call sc.stop(), it doesn't know that those applications have been stopped.

 Note that in spark 1.3, the history server can also display running
 applications (including completed applications, but that it thinks are
 still running), which improves things a little bit.

 On Fri, Apr 17, 2015 at 10:13 AM, Serega Sheypak serega.shey...@gmail.com
  wrote:

 Hi, started history-server
 Here is UI output:


- *Event log directory:* file:/var/log/spark/applicationHistory/

 No completed applications found!

 Did you specify the correct logging directory? Please verify your setting
 of spark.history.fs.logDirectory and whether you have the permissions to
 access it.
 It is also possible that your application did not run to completion or
 did not stop the SparkContext.

 Spark 1.2.0

 I goto node where server runs and:

 ls -la /var/log/spark/applicationHistory/

 total 44

 drwxrwxrwx 11 root  root4096 Apr 17 14:50 .

 drwxrwxrwx  3 cassandra root4096 Apr 16 15:31 ..

 drwxrwxrwx  2 vagrant   vagrant 4096 Apr 17 10:06 app-20150417100630-

 drwxrwxrwx  2 vagrant   vagrant 4096 Apr 17 11:01 app-20150417110140-0001

 drwxrwxrwx  2 vagrant   vagrant 4096 Apr 17 11:12 app-20150417111216-0002

 drwxrwxrwx  2 vagrant   vagrant 4096 Apr 17 11:14 app-20150417111441-0003

 drwxrwx---  2 vagrant   vagrant 4096 Apr 17 11:20
 *app-20150417112028-0004*

 drwxrwx---  2 vagrant   vagrant 4096 Apr 17 14:17
 *app-20150417141733-0005*

 drwxrwx---  2 vagrant   vagrant 4096 Apr 17 14:32
 *app-20150417143237-0006*

 drwxrwx---  2 vagrant   vagrant 4096 Apr 17 14:49
 *app-20150417144902-0007*

 drwxrwx---  2 vagrant   vagrant 4096 Apr 17 14:50
 *app-20150417145025-0008*


 So there are logs, but history-server doesn't want to display them.

 I've checked workers, they are pointed to that dir also, I run app, I see
 new log.


 Here is history-server log output:

 vagrant@dsenode01:/usr/lib/spark/logs$ cat
 spark-root-org.apache.spark.deploy.history.HistoryServer-1-dsenode01.out

 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath

 Spark Command: java -cp
 ::/usr/lib/spark/sbin/../conf:/usr/lib/spark/lib/spark-assembly-1.2.0-hadoop2.4.0.jar:/usr/lib/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/lib/spark/lib/datanucleus-core-3.2.10.jar
 -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true
 -Dspark.history.fs.logDirectory=/var/log/spark/applicationHistory
 -Dspark.eventLog.enabled=true -Xms512m -Xmx512m
 org.apache.spark.deploy.history.HistoryServer

 


 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties

 15/04/17 09:55:21 INFO HistoryServer: Registered signal handlers for
 [TERM, HUP, INT]

 15/04/17 09:55:21 INFO SecurityManager: Changing view acls to: root

 15/04/17 09:55:21 INFO SecurityManager: Changing modify acls to: root

 15/04/17 09:55:21 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(root); users
 with modify permissions: Set(root)

 15/04/17 09:55:22 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable

 15/04/17 09:55:24 INFO Utils: Successfully started service on port 18080.

 15/04/17 09:55:24 INFO HistoryServer: Started HistoryServer at
 http://dsenode01:18080


 What could be wrong with it?





Can not get executor's Log from Spark's History Server

2015-04-07 Thread donhoff_h
Hi, Experts


I run my Spark Cluster on Yarn. I used to get executors' Logs from Spark's 
History Server. But after I started my Hadoop jobhistory server and made 
configuration to aggregate logs of hadoop jobs to a HDFS directory, I found 
that I could not get spark's executors' Logs any more. Is there any solution so 
that I could get logs of my spark jobs from Spark History Server and get logs 
of my map-reduce jobs from Hadoop History Server? Many Thanks!


Following is the configuration I made in Hadoop yarn-site.xml
yarn.log-aggregation-enable=true
yarn.nodemanager.remote-app-log-dir=/mr-history/agg-logs
yarn.log-aggregation.retain-seconds=259200
yarn.log-aggregation.retain-check-interval-seconds=-1‍

Re: Can not get executor's Log from Spark's History Server

2015-04-07 Thread Marcelo Vanzin
The Spark history server does not have the ability to serve executor
logs currently. You need to use the yarn logs command for that.

On Tue, Apr 7, 2015 at 2:51 AM, donhoff_h 165612...@qq.com wrote:
 Hi, Experts

 I run my Spark Cluster on Yarn. I used to get executors' Logs from Spark's
 History Server. But after I started my Hadoop jobhistory server and made
 configuration to aggregate logs of hadoop jobs to a HDFS directory, I found
 that I could not get spark's executors' Logs any more. Is there any solution
 so that I could get logs of my spark jobs from Spark History Server and get
 logs of my map-reduce jobs from Hadoop History Server? Many Thanks!

 Following is the configuration I made in Hadoop yarn-site.xml
 yarn.log-aggregation-enable=true
 yarn.nodemanager.remote-app-log-dir=/mr-history/agg-logs
 yarn.log-aggregation.retain-seconds=259200
 yarn.log-aggregation.retain-check-interval-seconds=-1‍



-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark History Server : jobs link doesn't open

2015-03-26 Thread , Roy
in log I found this

2015-03-26 19:42:09,531 WARN org.eclipse.jetty.servlet.ServletHandler:
Error for /history/application_1425934191900_87572
org.spark-project.guava.common.util.concurrent.ExecutionError:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.spark-project.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
at 
org.spark-project.guava.common.cache.LocalCache.get(LocalCache.java:4000)


thanks

On Thu, Mar 26, 2015 at 7:27 PM, , Roy rp...@njit.edu wrote:

 We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2

 Jobs link on spark History server  doesn't open and shows following
 message :

 HTTP ERROR: 500

 Problem accessing /history/application_1425934191900_87572. Reason:

 Server Error

 --
 *Powered by Jetty://*




Spark History Server : jobs link doesn't open

2015-03-26 Thread , Roy
We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2

Jobs link on spark History server  doesn't open and shows following message
:

HTTP ERROR: 500

Problem accessing /history/application_1425934191900_87572. Reason:

Server Error

--
*Powered by Jetty://*


Re: Spark History Server : jobs link doesn't open

2015-03-26 Thread Marcelo Vanzin
bcc: user@, cc: cdh-user@

I recommend using CDH's mailing list whenever you have a problem with CDH.

That being said, you haven't provided enough info to debug the
problem. Since you're using CM, you can easily go look at the History
Server's logs and see what the underlying error is.


On Thu, Mar 26, 2015 at 4:27 PM, , Roy rp...@njit.edu wrote:
 We have Spark on YARN, with Cloudera Manager 5.3.2 and CDH 5.3.2

 Jobs link on spark History server  doesn't open and shows following message
 :

 HTTP ERROR: 500

 Problem accessing /history/application_1425934191900_87572. Reason:

 Server Error

 
 Powered by Jetty://




-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Job History Server

2015-03-20 Thread Sean Owen
Uh, does that mean HDP shipped Marcelo's uncommitted patch from
SPARK-1537 anyway? Given the discussion there, that seems kinda
aggressive.

On Wed, Mar 18, 2015 at 8:49 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Those classes are not part of standard Spark. You may want to contact
 Hortonworks directly if they're suggesting you use those.

 On Wed, Mar 18, 2015 at 3:30 AM, patcharee patcharee.thong...@uni.no wrote:
 Hi,

 I am using spark 1.3. I would like to use Spark Job History Server. I added
 the following line into conf/spark-defaults.conf

 spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
 spark.history.provider
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider
 spark.yarn.historyServer.address  sandbox.hortonworks.com:19888

 But got Exception in thread main java.lang.ClassNotFoundException:
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider

 What class is really needed? How to fix it?

 Br,
 Patcharee

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 Marcelo

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Job History Server

2015-03-20 Thread Zhan Zhang
Hi Patcharee,

It is an alpha feature in HDP distribution, integrating ATS with Spark history 
server. If you are using upstream, you can configure spark as regular without 
these configuration. But other related configuration are still mandatory, such 
as hdp.version related.

Thanks.

Zhan Zhang
 
On Mar 18, 2015, at 3:30 AM, patcharee patcharee.thong...@uni.no wrote:

 Hi,
 
 I am using spark 1.3. I would like to use Spark Job History Server. I added 
 the following line into conf/spark-defaults.conf
 
 spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
 spark.history.provider 
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider
 spark.yarn.historyServer.address  sandbox.hortonworks.com:19888
 
 But got Exception in thread main java.lang.ClassNotFoundException: 
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider
 
 What class is really needed? How to fix it?
 
 Br,
 Patcharee
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Job History Server

2015-03-18 Thread Akhil Das
You can simply turn it on using:

./sbin/start-history-server.sh


​Read more here http://spark.apache.org/docs/1.3.0/monitoring.html.​


Thanks
Best Regards

On Wed, Mar 18, 2015 at 4:00 PM, patcharee patcharee.thong...@uni.no
wrote:

 Hi,

 I am using spark 1.3. I would like to use Spark Job History Server. I
 added the following line into conf/spark-defaults.conf

 spark.yarn.services org.apache.spark.deploy.yarn.
 history.YarnHistoryService
 spark.history.provider org.apache.spark.deploy.yarn.
 history.YarnHistoryProvider
 spark.yarn.historyServer.address  sandbox.hortonworks.com:19888

 But got Exception in thread main java.lang.ClassNotFoundException:
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider

 What class is really needed? How to fix it?

 Br,
 Patcharee

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark Job History Server

2015-03-18 Thread patcharee

I turned it on. But it failed to start. In the log,

Spark assembly has been built with Hive, including Datanucleus jars on 
classpath
Spark Command: /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp 
:/root/spark-1.3.0-bin-hadoop2.4/sbin/../conf:/root/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/root/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/root/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/etc/hadoop/conf 
-XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m 
-Xmx512m org.apache.spark.deploy.history.HistoryServer



15/03/18 10:23:46 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Exception in thread main java.lang.ClassNotFoundException: 
org.apache.spark.deploy.yarn.history.YarnHistoryProvider

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at 
org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:183)
at 
org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)


Patcharee

On 18. mars 2015 11:35, Akhil Das wrote:

You can simply turn it on using:
|./sbin/start-history-server.sh|

​Read more here http://spark.apache.org/docs/1.3.0/monitoring.html.​


Thanks
Best Regards

On Wed, Mar 18, 2015 at 4:00 PM, patcharee patcharee.thong...@uni.no 
mailto:patcharee.thong...@uni.no wrote:


Hi,

I am using spark 1.3. I would like to use Spark Job History
Server. I added the following line into conf/spark-defaults.conf

spark.yarn.services
org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.history.provider
org.apache.spark.deploy.yarn.history.YarnHistoryProvider
spark.yarn.historyServer.address sandbox.hortonworks.com:19888
http://sandbox.hortonworks.com:19888

But got Exception in thread main
java.lang.ClassNotFoundException:
org.apache.spark.deploy.yarn.history.YarnHistoryProvider

What class is really needed? How to fix it?

Br,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
mailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
mailto:user-h...@spark.apache.org






Re: Spark Job History Server

2015-03-18 Thread Akhil Das
You are not having yarn package in the classpath. You need to build your
spark it with yarn. You can read these docs.
http://spark.apache.org/docs/1.3.0/running-on-yarn.html

Thanks
Best Regards

On Wed, Mar 18, 2015 at 4:07 PM, patcharee patcharee.thong...@uni.no
wrote:

  I turned it on. But it failed to start. In the log,

 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath
 Spark Command: /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp
 :/root/spark-1.3.0-bin-hadoop2.4/sbin/../conf:/root/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/root/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/root/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/etc/hadoop/conf
 -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m
 org.apache.spark.deploy.history.HistoryServer
 

 15/03/18 10:23:46 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 Exception in thread main java.lang.ClassNotFoundException:
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:191)
 at
 org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:183)
 at
 org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)

 Patcharee


 On 18. mars 2015 11:35, Akhil Das wrote:

  You can simply turn it on using:

 ./sbin/start-history-server.sh


  ​Read more here http://spark.apache.org/docs/1.3.0/monitoring.html.​


  Thanks
 Best Regards

 On Wed, Mar 18, 2015 at 4:00 PM, patcharee patcharee.thong...@uni.no
 wrote:

 Hi,

 I am using spark 1.3. I would like to use Spark Job History Server. I
 added the following line into conf/spark-defaults.conf

 spark.yarn.services
 org.apache.spark.deploy.yarn.history.YarnHistoryService
 spark.history.provider
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider
 spark.yarn.historyServer.address  sandbox.hortonworks.com:19888

 But got Exception in thread main java.lang.ClassNotFoundException:
 org.apache.spark.deploy.yarn.history.YarnHistoryProvider

 What class is really needed? How to fix it?

 Br,
 Patcharee

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Spark Job History Server

2015-03-18 Thread patcharee

Hi,

I am using spark 1.3. I would like to use Spark Job History Server. I 
added the following line into conf/spark-defaults.conf


spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.history.provider 
org.apache.spark.deploy.yarn.history.YarnHistoryProvider

spark.yarn.historyServer.address  sandbox.hortonworks.com:19888

But got Exception in thread main java.lang.ClassNotFoundException: 
org.apache.spark.deploy.yarn.history.YarnHistoryProvider


What class is really needed? How to fix it?

Br,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



  1   2   >