Spark loads data from HDFS or S3

2017-12-13 Thread Philip Lee
Hi
​


I have a few of questions about a structure of HDFS and S3 when Spark-like
loads data from two storage.


Generally, when Spark loads data from HDFS, HDFS supports data locality and
already own distributed file on datanodes, right? Spark could just process
data on workers.


What about S3? many people in this field use S3 for storage or loading data
remotely. When Spark loads data from S3 (sc.textFile('s3://...'), how all
data will be spread on Workers? Master node's responsible for this task? It
reads all data from S3, then spread the data to Worker? So it migt be a
trade-off compared to HDFS? or I got a wrong point of this
​.

​

What kind of points in S3 is better than that of HDFS?
​

​Thanks in Advanced​


reading ORC format on Spark-SQL

2016-02-10 Thread Philip Lee
What kind of steps exists when reading ORC format on Spark-SQL?
I meant usually reading csv file is just directly reading the dataset on
memory.

But I feel like Spark-SQL has some steps when reading ORC format.
For example, they have to create table to insert the dataset? and then they
insert the dataset to the table? theses steps are reading step in Spark-SQL?

[image: Inline image 1]


Spark Distribution of Small Dataset

2016-01-28 Thread Philip Lee
Hi,

Simple Question about Spark Distribution of Small Dataset.

Let's say I have 8 machine with 48 cores and 48GB of RAM as a cluster.
Dataset  (format is ORC by Hive) is so small like 1GB, but I copied it to
HDFS.

1) if spark-sql run the dataset distributed on HDFS in each machine, what
happens to the job? I meant one machine handles the dataset because it is
so small?

2) but the thing is dataset is already distributed in each machine.
or each machine handles the distributed dataset and send it to the Master
Node?

Could you explain about this in detail in a distributed way?

Best,
Phil


Re: a question about web ui log

2016-01-26 Thread Philip Lee
Yes, I tried it, but it simply does not work.

so, my concern is *to use "ssh tunnel" to forward a port of cluster to
localhost port. *

But in Spark UI, there are two ports which I should forward using "*ssh
tunnel*".
Considering a default port, 8080 is web-ui port to come into web-ui, and
4040 is web-monitoring port to see time execution like DAG in application
details UI, as you probably know.

But after finishing a job, I can see the list of a job on the web-ui on
8080, but when I click "application details UI" on port 4040 to see time
excution, it does not work.

Any suggestion? I really need to see execution of DAG.

Best,
Phil

On Tue, Jan 26, 2016 at 12:04 AM, Mohammed Guller 
wrote:

> I am not sure whether you can copy the log files from Spark workers to
> your local machine and view it from the Web UI. In fact, if you are able to
> copy the log files locally, you can just view them directly in any text
> editor.
>
>
>
> I suspect what you really want to see is the application history. Here is
> the relevant information from Spark’s monitoring page (
> http://spark.apache.org/docs/latest/monitoring.html)
>
>
>
> To view the web UI after the fact, set spark.eventLog.enabled to true
> before starting the application. This configures Spark to log Spark events
> that encode the information displayed in the UI to persisted storage.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Philip Lee [mailto:philjj...@gmail.com]
> *Sent:* Monday, January 25, 2016 9:51 AM
> *To:* user@spark.apache.org
> *Subject:* Re: a question about web ui log
>
>
>
> As I mentioned before, I am tryint to see the spark log on a cluster via
> ssh-tunnel
>
>
>
> 1) The error on application details UI is probably from monitoring porting
> ​4044. Web UI port is 8088, right? so how could I see job web ui view and
> application details UI view in the web ui on my local machine?
>
>
>
> 2) still wondering how to see the log after copyting log file to my local.
>
>
>
> The error was metioned in previous mail.
>
>
>
> Thanks,
>
> Phil
>
>
>
>
>
>
>
> On Mon, Jan 25, 2016 at 5:36 PM, Philip Lee  wrote:
>
> ​Hello, a questino about web UI log.
>
>
>
> ​I could see web interface log after forwarding the port on my cluster to
> my local and click completed application, but when I clicked "application
> detail UI"
>
>
>
> [image: Inline image 1]
>
>
>
> It happened to me. I do not know why. I also checked the specific log
> folder. It has a log file in it. Actually, that's why I could click the
> completed application link, right?
>
>
>
> So is it okay for me to copy the log file in my cluster to my local
> machine.
>
> And after turning on spark Job Manger on my local by myself, I could see
> application deatils UI in my local machine?
>
>
>
> Best,
>
> Phil
>
>
>


Re: a question about web ui log

2016-01-25 Thread Philip Lee
As I mentioned before, I am tryint to see the spark log on a cluster via
ssh-tunnel

1) The error on application details UI is probably from monitoring porting
​4044. Web UI port is 8088, right? so how could I see job web ui view and
application details UI view in the web ui on my local machine?

2) still wondering how to see the log after copyting log file to my local.

The error was metioned in previous mail.

Thanks,
Phil



On Mon, Jan 25, 2016 at 5:36 PM, Philip Lee  wrote:

> ​Hello, a questino about web UI log.
>
> ​I could see web interface log after forwarding the port on my cluster to
> my local and click completed application, but when I clicked "application
> detail UI"
>
> [image: Inline image 1]
>
> It happened to me. I do not know why. I also checked the specific log
> folder. It has a log file in it. Actually, that's why I could click the
> completed application link, right?
>
> So is it okay for me to copy the log file in my cluster to my local
> machine.
> And after turning on spark Job Manger on my local by myself, I could see
> application deatils UI in my local machine?
>
> Best,
> Phil
>


a question about web ui log

2016-01-25 Thread Philip Lee
​Hello, a questino about web UI log.

​I could see web interface log after forwarding the port on my cluster to
my local and click completed application, but when I clicked "application
detail UI"

[image: Inline image 1]

It happened to me. I do not know why. I also checked the specific log
folder. It has a log file in it. Actually, that's why I could click the
completed application link, right?

So is it okay for me to copy the log file in my cluster to my local machine.
And after turning on spark Job Manger on my local by myself, I could see
application deatils UI in my local machine?

Best,
Phil