Thanks all.

I think we are diverging but IMO it is a worthwhile discussion

Actually, threads are a hardware implementation - hence the whole notion of
“multi-threaded cores”.   What happens is that the cores often have
duplicate registers, etc. for holding execution state.   While it is
correct that only a single process is executing at a time, a single core
will have execution states of multiple processes preserved in these
registers. In addition, it is the core (not the OS) that determines when
the thread is executed. The approach often varies according to the CPU
manufacturer, but the most simple approach is when one thread of execution
executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
core simply stops processing that thread saves the execution state to a set
of registers, loads instructions from the other set of registers and goes
on.  On the Oracle SPARC chips, it will actually check the next thread to
see if the reason it was ‘parked’ has completed and if not, skip it for the
subsequent thread. The OS is only aware of what are cores and what are
logical processors - and dispatches accordingly.  *Execution is up to the
cores*. .

Cheers



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 16 June 2016 at 13:02, Robin East <robin.e...@xense.co.uk> wrote:

> Mich
>
> >> A core may have one or more threads
> It would be more accurate to say that a core could *run* one or more
> threads scheduled for execution. Threads are a software/OS concept that
> represent executable code that is scheduled to run by the OS; A CPU, core
> or virtual core/virtual processor execute that code. Threads are not CPUs
> or cores whether physical or logical - any Spark documentation that implies
> this is mistaken. I’ve looked at the documentation you mention and I don’t
> read it to mean that threads are logical processors.
>
> To go back to your original question, if you set local[6] and you have 12
> logical processors then you are likely to have half your CPU resources
> unused by Spark.
>
>
> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> I think it is slightly more than that.
>
> These days  software is licensed by core (generally speaking).   That is
> the physical processor.   * A core may have one or more threads - or
> logical processors*. Virtualization adds some fun to the mix.   Generally
> what they present is ‘virtual processors’.   What that equates to depends
> on the virtualization layer itself.   In some simpler VM’s - it is
> virtual=logical.   In others, virtual=logical but they are constrained to
> be from the same cores - e.g. if you get 6 virtual processors, it really is
> 3 full cores with 2 threads each.   Rational is due to the way OS
> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
> applications.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 13 June 2016 at 18:17, Mark Hamstra <m...@clearstorydata.com> wrote:
>
>> I don't know what documentation you were referring to, but this is
>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>> too concerned about how any particular hardware vendor wants to refer to
>> the specific components of their CPU architecture.  For us, a core is a
>> logical execution unit, something on which a thread of execution can run.
>> That can map in different ways to different physical or virtual hardware.
>>
>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> It is not the issue of testing anything. I was referring to
>>> documentation that clearly use the term "threads". As I said and showed
>>> before, one line is using the term "thread" and the next one "logical
>>> cores".
>>>
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>> daniel.dara...@lynxanalytics.com> wrote:
>>>
>>>> Spark is a software product. In software a "core" is something that a
>>>> process can run on. So it's a "virtual core". (Do not call these "threads".
>>>> A "thread" is not something a process can run on.)
>>>>
>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>> Since Java is software, this also returns the number of virtual cores. (You
>>>> can test this easily.)
>>>>
>>>>
>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>
>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>
>>>>> This is my understanding of cores and threads.
>>>>>
>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>> the core work on two loads at same time. In other words, every thread 
>>>>> takes
>>>>> care of one load.
>>>>>
>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>
>>>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>>>
>>>>> For example on the same host that I have Spark, I have a SAP product
>>>>> that checks the licensing and shuts the application down if the license
>>>>> does not agree with the cores speced.
>>>>>
>>>>> This is what it says
>>>>>
>>>>> ./cpuinfo
>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>
>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>>> logical processors as threads so I have 12 threads?
>>>>>
>>>>> Now if I go and start worker process
>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>
>>>>> <image.png>
>>>>>
>>>>> it says 12 cores but I gather it is threads?
>>>>>
>>>>> Spark document
>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>> states and I quote
>>>>>
>>>>> <image.png>
>>>>>
>>>>>
>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>> your machine*
>>>>>
>>>>> But I know that it means threads. Because if I went and set that to 6,
>>>>> it would be only 6 threads as opposed to 12 threads.
>>>>>
>>>>> the next line local[*] seems to indicate it correctly as it refers to
>>>>> "logical cores" that in my understanding it is threads.
>>>>>
>>>>> I trust that I am not nitpicking here!
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
>

Reply via email to