[no subject]

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


[no subject]

2023-08-07 Thread Bode, Meikel
unsubscribe


[no subject]

2023-06-13 Thread Amanda Liu



[no subject]

2023-03-26 Thread Tanay Banerjee
unsubscribe


[no subject]

2023-03-21 Thread Tanay Banerjee
Unsubscribe


[no subject]

2023-03-06 Thread ansel boero
unsubscribe

subject

2022-11-24 Thread huldar chen
subject


[no subject]

2022-11-02 Thread yogita bhardwaj
I wants to unsubscribe

Sent from Mail for Windows



[no subject]

2022-09-20 Thread yogita bhardwaj

I have installed pyspark using pip.
I m getting the error while running the following code.
from pyspark import SparkContext
sc=SparkContext()
a=sc.parallelize([1,2,3,4])
print(f"a_take:{a.take(2)}")

py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 
0) (DESKTOP-DR2QC97.mshome.net executor driver): 
org.apache.spark.SparkException: Python worker failed to connect back.
    at 
org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:189)
    at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:109)
    at 
org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:124)
    at 
org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:164)
    at 
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
    at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)

Can anyone please help me to resolve this issue.



[no subject]

2021-05-03 Thread Tianchen Zhang
Hi all,

Currently the user-facing Catalog API doesn't support backup/restore
metadata. Our customers are asking for such functionalities. Here is a
usage example:
1. Read all metadata of one Spark cluster
2. Save them into a Parquet file on DFS
3. Read the Parquet file and restore all metadata in another Spark cluster

>From the current implementation, Catalog API has the list methods
(listDatabases, listFunctions, etc.) but they don't return enough
information in order to restore an entity (for example, listDatabases lose
"properties" of the database and we need "describe database extended" to
get them). And it only supports createTable (not any other entity
creations). The only way we can backup/restore an entity is using Spark SQL.

We want to introduce the backup and restore from an API level. We are
thinking of doing this simply by adding backup() and restore() in
CatalogImpl, as ExternalCatalog already includes all the methods we need to
retrieve and recreate entities. We are wondering if there is any concern or
drawback of this approach. Please advise.

Thank you in advance,
Tianchen


[no subject]

2021-03-26 Thread Domingo Mihovilovic



[no subject]

2021-03-10 Thread rahul c
Unsubscribe


[no subject]

2021-03-09 Thread Anton Solod
Unsubscribe


[no subject]

2021-01-20 Thread iriv kang
Unsubscribe


[no subject]

2021-01-09 Thread Christos Ziakas
Unsubscribe


[no subject]

2021-01-08 Thread Bhavya Jain
Unsubscribe


[no subject]

2021-01-08 Thread Chris Brown
Unsubscribe


[no subject]

2021-01-08 Thread Christos Ziakas
Unsubscribe


[no subject]

2021-01-07 Thread iriv kang
Unsubscribe


[no subject]

2021-01-07 Thread rahul c
Unsubscribe


[no subject]

2020-12-08 Thread Владимир Курятков
unsubscribe


[no subject]

2020-12-08 Thread rahul c
Unsubscribe


[no subject]

2020-12-02 Thread rahul c
Unsubscribe


[no subject]

2020-08-04 Thread Rohit Mishra
Hello Everyone,

Someone asked this question on JIRA and since it was a question I requested
him to check stack overflow. Personally I don't have an answer to this
question so in case anyone has an idea please feel free to update the
issue. I have marked it resolved for the time being but thought to take
your opinion. Whenever you are free, link is here -
https://issues.apache.org/jira/browse/SPARK-32527

Thanks in advance for your time.

Regards,
Rohit Mishra


Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-25 Thread Holden Karau
It sounds like with the slight wording change we’re in agreement so I’ll
bounce this by an editor friend to fix my grammar/spelling before I put it
up for a vote.

On Sat, Jul 25, 2020 at 9:23 PM Hyukjin Kwon  wrote:

> +1 thanks Holden.
>
> On Fri, 24 Jul 2020, 22:34 Tom Graves, 
> wrote:
>
>> +1
>>
>> Tom
>>
>> On Tuesday, July 21, 2020, 03:35:18 PM CDT, Holden Karau <
>> hol...@pigscanfly.ca> wrote:
>>
>>
>> Hi Spark Developers,
>>
>> There has been a rather active discussion regarding the specific vetoes
>> that occured during Spark 3. From that I believe we are now mostly in
>> agreement that it would be best to clarify our rules around code vetoes &
>> merging in general. Personally I believe this change is important to help
>> improve the appearance of a level playing field in the project.
>>
>> Once discussion settles I'll run this by a copy editor, my grammar isn't
>> amazing, and bring forward for a vote.
>>
>> The current Spark committer guide is at
>> https://spark.apache.org/committers.html. I am proposing we add a
>> section on when it is OK to merge PRs directly above the section on how to
>> merge PRs. The text I am proposing to amend our committer guidelines with
>> is:
>>
>> PRs shall not be merged during active on topic discussion except for
>> issues like critical security fixes of a public vulnerability. Under
>> extenuating circumstances PRs may be merged during active off topic
>> discussion and the discussion directed to a more appropriate venue. Time
>> should be given prior to merging for those involved with the conversation
>> to explain if they believe they are on topic.
>>
>> Lazy consensus requires giving time for discussion to settle, while
>> understanding that people may not be working on Spark as their full time
>> job and may take holidays. It is believed that by doing this we can limit
>> how often people feel the need to exercise their veto.
>>
>> For the purposes of a -1 on code changes, a qualified voter includes all
>> PMC members and committers in the project. For a -1 to be a valid veto it
>> must include a technical reason. The reason can include things like the
>> change may introduce a maintenance burden or is not the direction of Spark.
>>
>> If there is a -1 from a non-committer, multiple committers or the PMC
>> should be consulted before moving forward.
>>
>>
>> If the original person who cast the veto can not be reached in a
>> reasonable time frame given likely holidays, it is up to the PMC to decide
>> the next steps within the guidelines of the ASF. This must be decided by a
>> consensus vote under the ASF voting rules.
>>
>> These policies serve to reiterate the core principle that code must not
>> be merged with a pending veto or before a consensus has been reached (lazy
>> or otherwise).
>>
>> It is the PMC’s hope that vetoes continue to be infrequent, and when they
>> occur all parties take the time to build consensus prior to additional
>> feature work.
>>
>>
>> Being a committer means exercising your judgement, while working in a
>> community with diverse views. There is nothing wrong in getting a second
>> (or 3rd or 4th) opinion when you are uncertain. Thank you for your
>> dedication to the Spark project, it is appreciated by the developers and
>> users of Spark.
>>
>>
>> It is hoped that these guidelines do not slow down development, rather by
>> removing some of the uncertainty that makes it easier for us to reach
>> consensus. If you have ideas on how to improve these guidelines, or other
>> parts of how the Spark project operates you should reach out on the dev@
>> list to start the discussion.
>>
>>
>>
>> Kind Regards,
>>
>> Holden
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-25 Thread Hyukjin Kwon
+1 thanks Holden.

On Fri, 24 Jul 2020, 22:34 Tom Graves,  wrote:

> +1
>
> Tom
>
> On Tuesday, July 21, 2020, 03:35:18 PM CDT, Holden Karau <
> hol...@pigscanfly.ca> wrote:
>
>
> Hi Spark Developers,
>
> There has been a rather active discussion regarding the specific vetoes
> that occured during Spark 3. From that I believe we are now mostly in
> agreement that it would be best to clarify our rules around code vetoes &
> merging in general. Personally I believe this change is important to help
> improve the appearance of a level playing field in the project.
>
> Once discussion settles I'll run this by a copy editor, my grammar isn't
> amazing, and bring forward for a vote.
>
> The current Spark committer guide is at
> https://spark.apache.org/committers.html. I am proposing we add a section
> on when it is OK to merge PRs directly above the section on how to merge
> PRs. The text I am proposing to amend our committer guidelines with is:
>
> PRs shall not be merged during active on topic discussion except for
> issues like critical security fixes of a public vulnerability. Under
> extenuating circumstances PRs may be merged during active off topic
> discussion and the discussion directed to a more appropriate venue. Time
> should be given prior to merging for those involved with the conversation
> to explain if they believe they are on topic.
>
> Lazy consensus requires giving time for discussion to settle, while
> understanding that people may not be working on Spark as their full time
> job and may take holidays. It is believed that by doing this we can limit
> how often people feel the need to exercise their veto.
>
> For the purposes of a -1 on code changes, a qualified voter includes all
> PMC members and committers in the project. For a -1 to be a valid veto it
> must include a technical reason. The reason can include things like the
> change may introduce a maintenance burden or is not the direction of Spark.
>
> If there is a -1 from a non-committer, multiple committers or the PMC
> should be consulted before moving forward.
>
>
> If the original person who cast the veto can not be reached in a
> reasonable time frame given likely holidays, it is up to the PMC to decide
> the next steps within the guidelines of the ASF. This must be decided by a
> consensus vote under the ASF voting rules.
>
> These policies serve to reiterate the core principle that code must not be
> merged with a pending veto or before a consensus has been reached (lazy or
> otherwise).
>
> It is the PMC’s hope that vetoes continue to be infrequent, and when they
> occur all parties take the time to build consensus prior to additional
> feature work.
>
>
> Being a committer means exercising your judgement, while working in a
> community with diverse views. There is nothing wrong in getting a second
> (or 3rd or 4th) opinion when you are uncertain. Thank you for your
> dedication to the Spark project, it is appreciated by the developers and
> users of Spark.
>
>
> It is hoped that these guidelines do not slow down development, rather by
> removing some of the uncertainty that makes it easier for us to reach
> consensus. If you have ideas on how to improve these guidelines, or other
> parts of how the Spark project operates you should reach out on the dev@
> list to start the discussion.
>
>
>
> Kind Regards,
>
> Holden
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-24 Thread Tom Graves
 +1
Tom
On Tuesday, July 21, 2020, 03:35:18 PM CDT, Holden Karau 
 wrote:  
 
 Hi Spark Developers,
There has been a rather active discussion regarding the specific vetoes that 
occured during Spark 3. From that I believe we are now mostly in agreement that 
it would be best to clarify our rules around code vetoes & merging in general. 
Personally I believe this change is important to help improve the appearance of 
a level playing field in the project.
Once discussion settles I'll run this by a copy editor, my grammar isn't 
amazing, and bring forward for a vote.
The current Spark committer guide is at 
https://spark.apache.org/committers.html. I am proposing we add a section on 
when it is OK to merge PRs directly above the section on how to merge PRs. The 
text I am proposing to amend our committer guidelines with is:

PRs shall not be merged during active on topic discussion except for issues 
like critical security fixes of a public vulnerability. Under extenuating 
circumstances PRs may be merged during active off topic discussion and the 
discussion directed to a more appropriate venue. Time should be given prior to 
merging for those involved with the conversation to explain if they believe 
they are on topic.


Lazy consensus requires giving time for discussion to settle, while 
understanding that people may not be working on Spark as their full time job 
and may take holidays. It is believed that by doing this we can limit how often 
people feel the need to exercise their veto.


For the purposes of a -1 on code changes, a qualified voter includes all PMC 
members and committers in the project. For a -1 to be a valid veto it must 
include a technical reason. The reason can include things like the change may 
introduce a maintenance burden or is not the direction of Spark.


If there is a -1 from a non-committer, multiple committers or the PMC should be 
consulted before moving forward.




If the original person who cast the veto can not be reached in a reasonable 
time frame given likely holidays, it is up to the PMC to decide the next steps 
within the guidelines of the ASF. This must be decided by a consensus vote 
under the ASF voting rules.


These policies serve to reiterate the core principle that code must not be 
merged with a pending veto or before a consensus has been reached (lazy or 
otherwise).


It is the PMC’s hope that vetoes continue to be infrequent, and when they occur 
all parties take the time to build consensus prior to additional feature work.




Being a committer means exercising your judgement, while working in a community 
with diverse views. There is nothing wrong in getting a second (or 3rd or 4th) 
opinion when you are uncertain. Thank you for your dedication to the Spark 
project, it is appreciated by the developers and users of Spark.




It is hoped that these guidelines do not slow down development, rather by 
removing some of the uncertainty that makes it easier for us to reach 
consensus. If you have ideas on how to improve these guidelines, or other parts 
of how the Spark project operates you should reach out on the dev@ list to 
start the discussion.





Kind Regards,
Holden
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
YouTube Live Streams: https://www.youtube.com/user/holdenkarau  

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-23 Thread Mridul Muralidharan
Thanks Holden, this version looks good to me.
+1

Regards,
Mridul


On Thu, Jul 23, 2020 at 3:56 PM Imran Rashid  wrote:

> Sure, that sounds good to me.  +1
>
> On Wed, Jul 22, 2020 at 1:50 PM Holden Karau  wrote:
>
>>
>>
>> On Wed, Jul 22, 2020 at 7:39 AM Imran Rashid < iras...@apache.org >
>> wrote:
>>
>>> Hi Holden,
>>>
>>> thanks for leading this discussion, I'm in favor in general.  I have one
>>> specific question -- these two sections seem to contradict each other
>>> slightly:
>>>
>>> > If there is a -1 from a non-committer, multiple committers or the PMC
>>> should be consulted before moving forward.
>>> >
>>> >If the original person who cast the veto can not be reached in a
>>> reasonable time frame given likely holidays, it is up to the PMC to decide
>>> the next steps within the guidelines of the ASF. This must be decided by a
>>> consensus vote under the ASF voting rules.
>>>
>>> I think the intent here is that if a *committer* gives a -1, then the
>>> PMC has to have a consensus vote?  And if a non-committer gives a -1, then
>>> multiple committers should be consulted?  How about combining those two
>>> into something like
>>>
>>> "All -1s with justification merit discussion.  A -1 from a non-committer
>>> can be overridden only with input from multiple committers.  A -1 from a
>>> committer requires a consensus vote of the PMC under ASF voting rules".
>>>
>> I can work with that although it wasn’t quite what I was originally going
>> for. I didn’t intend to have committer -1s be eligible for override. I
>> believe committers have demonstrated sufficient merit; they are the same as
>> PMC member -1s in our project.
>>
>> My aim was just if something weird happens (like say I had a pending -1
>> before my motorcycle crash last year) we go to the PMC and take a binding
>> vote on what to do, and most likely someone on the PMC will reach out to
>> the ASF for understanding around the guidelines.
>>
>> What about:
>>
>> All -1s with justification merit discussion.  A -1 from a non-committer
>> can be overridden only with input from multiple committers and suitable
>> time for any committer to raise concerns.  A -1 from a committer who can
>> not be reached requires a consensus vote of the PMC under ASF voting rules
>> to determine the next steps within the ASF guidelines for vetos.
>>
>>>
>>>
>>> thanks,
>>> Imran
>>>
>>>
>>> On Tue, Jul 21, 2020 at 3:41 PM Holden Karau 
>>> wrote:
>>>
 Hi Spark Developers,

 There has been a rather active discussion regarding the specific vetoes
 that occured during Spark 3. From that I believe we are now mostly in
 agreement that it would be best to clarify our rules around code vetoes &
 merging in general. Personally I believe this change is important to help
 improve the appearance of a level playing field in the project.

 Once discussion settles I'll run this by a copy editor, my grammar
 isn't amazing, and bring forward for a vote.

 The current Spark committer guide is at https://spark.apache.org/
 committers.html. I am proposing we add a section on when it is OK to
 merge PRs directly above the section on how to merge PRs. The text I am
 proposing to amend our committer guidelines with is:

 PRs shall not be merged during active on topic discussion except for
 issues like critical security fixes of a public vulnerability. Under
 extenuating circumstances PRs may be merged during active off topic
 discussion and the discussion directed to a more appropriate venue. Time
 should be given prior to merging for those involved with the conversation
 to explain if they believe they are on topic.

 Lazy consensus requires giving time for discussion to settle, while
 understanding that people may not be working on Spark as their full time
 job and may take holidays. It is believed that by doing this we can limit
 how often people feel the need to exercise their veto.

 For the purposes of a -1 on code changes, a qualified voter includes
 all PMC members and committers in the project. For a -1 to be a valid veto
 it must include a technical reason. The reason can include things like the
 change may introduce a maintenance burden or is not the direction of Spark.

 If there is a -1 from a non-committer, multiple committers or the PMC
 should be consulted before moving forward.


 If the original person who cast the veto can not be reached in a
 reasonable time frame given likely holidays, it is up to the PMC to decide
 the next steps within the guidelines of the ASF. This must be decided by a
 consensus vote under the ASF voting rules.

 These policies serve to reiterate the core principle that code must not
 be merged with a pending veto or before a consensus has been reached (lazy
 or otherwise).

 It is the PMC’s hope that vetoes continue to be infrequent, and when
 they 

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-23 Thread Imran Rashid
Sure, that sounds good to me.  +1

On Wed, Jul 22, 2020 at 1:50 PM Holden Karau  wrote:

>
>
> On Wed, Jul 22, 2020 at 7:39 AM Imran Rashid < iras...@apache.org > wrote:
>
>> Hi Holden,
>>
>> thanks for leading this discussion, I'm in favor in general.  I have one
>> specific question -- these two sections seem to contradict each other
>> slightly:
>>
>> > If there is a -1 from a non-committer, multiple committers or the PMC
>> should be consulted before moving forward.
>> >
>> >If the original person who cast the veto can not be reached in a
>> reasonable time frame given likely holidays, it is up to the PMC to decide
>> the next steps within the guidelines of the ASF. This must be decided by a
>> consensus vote under the ASF voting rules.
>>
>> I think the intent here is that if a *committer* gives a -1, then the PMC
>> has to have a consensus vote?  And if a non-committer gives a -1, then
>> multiple committers should be consulted?  How about combining those two
>> into something like
>>
>> "All -1s with justification merit discussion.  A -1 from a non-committer
>> can be overridden only with input from multiple committers.  A -1 from a
>> committer requires a consensus vote of the PMC under ASF voting rules".
>>
> I can work with that although it wasn’t quite what I was originally going
> for. I didn’t intend to have committer -1s be eligible for override. I
> believe committers have demonstrated sufficient merit; they are the same as
> PMC member -1s in our project.
>
> My aim was just if something weird happens (like say I had a pending -1
> before my motorcycle crash last year) we go to the PMC and take a binding
> vote on what to do, and most likely someone on the PMC will reach out to
> the ASF for understanding around the guidelines.
>
> What about:
>
> All -1s with justification merit discussion.  A -1 from a non-committer
> can be overridden only with input from multiple committers and suitable
> time for any committer to raise concerns.  A -1 from a committer who can
> not be reached requires a consensus vote of the PMC under ASF voting rules
> to determine the next steps within the ASF guidelines for vetos.
>
>>
>>
>> thanks,
>> Imran
>>
>>
>> On Tue, Jul 21, 2020 at 3:41 PM Holden Karau 
>> wrote:
>>
>>> Hi Spark Developers,
>>>
>>> There has been a rather active discussion regarding the specific vetoes
>>> that occured during Spark 3. From that I believe we are now mostly in
>>> agreement that it would be best to clarify our rules around code vetoes &
>>> merging in general. Personally I believe this change is important to help
>>> improve the appearance of a level playing field in the project.
>>>
>>> Once discussion settles I'll run this by a copy editor, my grammar isn't
>>> amazing, and bring forward for a vote.
>>>
>>> The current Spark committer guide is at https://spark.apache.org/
>>> committers.html. I am proposing we add a section on when it is OK to
>>> merge PRs directly above the section on how to merge PRs. The text I am
>>> proposing to amend our committer guidelines with is:
>>>
>>> PRs shall not be merged during active on topic discussion except for
>>> issues like critical security fixes of a public vulnerability. Under
>>> extenuating circumstances PRs may be merged during active off topic
>>> discussion and the discussion directed to a more appropriate venue. Time
>>> should be given prior to merging for those involved with the conversation
>>> to explain if they believe they are on topic.
>>>
>>> Lazy consensus requires giving time for discussion to settle, while
>>> understanding that people may not be working on Spark as their full time
>>> job and may take holidays. It is believed that by doing this we can limit
>>> how often people feel the need to exercise their veto.
>>>
>>> For the purposes of a -1 on code changes, a qualified voter includes all
>>> PMC members and committers in the project. For a -1 to be a valid veto it
>>> must include a technical reason. The reason can include things like the
>>> change may introduce a maintenance burden or is not the direction of Spark.
>>>
>>> If there is a -1 from a non-committer, multiple committers or the PMC
>>> should be consulted before moving forward.
>>>
>>>
>>> If the original person who cast the veto can not be reached in a
>>> reasonable time frame given likely holidays, it is up to the PMC to decide
>>> the next steps within the guidelines of the ASF. This must be decided by a
>>> consensus vote under the ASF voting rules.
>>>
>>> These policies serve to reiterate the core principle that code must not
>>> be merged with a pending veto or before a consensus has been reached (lazy
>>> or otherwise).
>>>
>>> It is the PMC’s hope that vetoes continue to be infrequent, and when
>>> they occur all parties take the time to build consensus prior to additional
>>> feature work.
>>>
>>>
>>> Being a committer means exercising your judgement, while working in a
>>> community with diverse views. There is nothing wrong in 

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-22 Thread Holden Karau
On Wed, Jul 22, 2020 at 7:39 AM Imran Rashid < iras...@apache.org > wrote:

> Hi Holden,
>
> thanks for leading this discussion, I'm in favor in general.  I have one
> specific question -- these two sections seem to contradict each other
> slightly:
>
> > If there is a -1 from a non-committer, multiple committers or the PMC
> should be consulted before moving forward.
> >
> >If the original person who cast the veto can not be reached in a
> reasonable time frame given likely holidays, it is up to the PMC to decide
> the next steps within the guidelines of the ASF. This must be decided by a
> consensus vote under the ASF voting rules.
>
> I think the intent here is that if a *committer* gives a -1, then the PMC
> has to have a consensus vote?  And if a non-committer gives a -1, then
> multiple committers should be consulted?  How about combining those two
> into something like
>
> "All -1s with justification merit discussion.  A -1 from a non-committer
> can be overridden only with input from multiple committers.  A -1 from a
> committer requires a consensus vote of the PMC under ASF voting rules".
>
I can work with that although it wasn’t quite what I was originally going
for. I didn’t intend to have committer -1s be eligible for override. I
believe committers have demonstrated sufficient merit; they are the same as
PMC member -1s in our project.

My aim was just if something weird happens (like say I had a pending -1
before my motorcycle crash last year) we go to the PMC and take a binding
vote on what to do, and most likely someone on the PMC will reach out to
the ASF for understanding around the guidelines.

What about:

All -1s with justification merit discussion.  A -1 from a non-committer can
be overridden only with input from multiple committers and suitable time
for any committer to raise concerns.  A -1 from a committer who can not be
reached requires a consensus vote of the PMC under ASF voting rules to
determine the next steps within the ASF guidelines for vetos.

>
>
> thanks,
> Imran
>
>
> On Tue, Jul 21, 2020 at 3:41 PM Holden Karau  wrote:
>
>> Hi Spark Developers,
>>
>> There has been a rather active discussion regarding the specific vetoes
>> that occured during Spark 3. From that I believe we are now mostly in
>> agreement that it would be best to clarify our rules around code vetoes &
>> merging in general. Personally I believe this change is important to help
>> improve the appearance of a level playing field in the project.
>>
>> Once discussion settles I'll run this by a copy editor, my grammar isn't
>> amazing, and bring forward for a vote.
>>
>> The current Spark committer guide is at https://spark.apache.org/
>> committers.html. I am proposing we add a section on when it is OK to
>> merge PRs directly above the section on how to merge PRs. The text I am
>> proposing to amend our committer guidelines with is:
>>
>> PRs shall not be merged during active on topic discussion except for
>> issues like critical security fixes of a public vulnerability. Under
>> extenuating circumstances PRs may be merged during active off topic
>> discussion and the discussion directed to a more appropriate venue. Time
>> should be given prior to merging for those involved with the conversation
>> to explain if they believe they are on topic.
>>
>> Lazy consensus requires giving time for discussion to settle, while
>> understanding that people may not be working on Spark as their full time
>> job and may take holidays. It is believed that by doing this we can limit
>> how often people feel the need to exercise their veto.
>>
>> For the purposes of a -1 on code changes, a qualified voter includes all
>> PMC members and committers in the project. For a -1 to be a valid veto it
>> must include a technical reason. The reason can include things like the
>> change may introduce a maintenance burden or is not the direction of Spark.
>>
>> If there is a -1 from a non-committer, multiple committers or the PMC
>> should be consulted before moving forward.
>>
>>
>> If the original person who cast the veto can not be reached in a
>> reasonable time frame given likely holidays, it is up to the PMC to decide
>> the next steps within the guidelines of the ASF. This must be decided by a
>> consensus vote under the ASF voting rules.
>>
>> These policies serve to reiterate the core principle that code must not
>> be merged with a pending veto or before a consensus has been reached (lazy
>> or otherwise).
>>
>> It is the PMC’s hope that vetoes continue to be infrequent, and when they
>> occur all parties take the time to build consensus prior to additional
>> feature work.
>>
>>
>> Being a committer means exercising your judgement, while working in a
>> community with diverse views. There is nothing wrong in getting a second
>> (or 3rd or 4th) opinion when you are uncertain. Thank you for your
>> dedication to the Spark project, it is appreciated by the developers and
>> users of Spark.
>>
>>
>> It is hoped that these 

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-22 Thread Imran Rashid
Hi Holden,

thanks for leading this discussion, I'm in favor in general.  I have one
specific question -- these two sections seem to contradict each other
slightly:

> If there is a -1 from a non-committer, multiple committers or the PMC
should be consulted before moving forward.
>
>If the original person who cast the veto can not be reached in a
reasonable time frame given likely holidays, it is up to the PMC to decide
the next steps within the guidelines of the ASF. This must be decided by a
consensus vote under the ASF voting rules.

I think the intent here is that if a *committer* gives a -1, then the PMC
has to have a consensus vote?  And if a non-committer gives a -1, then
multiple committers should be consulted?  How about combining those two
into something like

"All -1s with justification merit discussion.  A -1 from a non-committer
can be overridden only with input from multiple committers.  A -1 from a
committer requires a consensus vote of the PMC under ASF voting rules".


thanks,
Imran


On Tue, Jul 21, 2020 at 3:41 PM Holden Karau  wrote:

> Hi Spark Developers,
>
> There has been a rather active discussion regarding the specific vetoes
> that occured during Spark 3. From that I believe we are now mostly in
> agreement that it would be best to clarify our rules around code vetoes &
> merging in general. Personally I believe this change is important to help
> improve the appearance of a level playing field in the project.
>
> Once discussion settles I'll run this by a copy editor, my grammar isn't
> amazing, and bring forward for a vote.
>
> The current Spark committer guide is at
> https://spark.apache.org/committers.html. I am proposing we add a section
> on when it is OK to merge PRs directly above the section on how to merge
> PRs. The text I am proposing to amend our committer guidelines with is:
>
> PRs shall not be merged during active on topic discussion except for
> issues like critical security fixes of a public vulnerability. Under
> extenuating circumstances PRs may be merged during active off topic
> discussion and the discussion directed to a more appropriate venue. Time
> should be given prior to merging for those involved with the conversation
> to explain if they believe they are on topic.
>
> Lazy consensus requires giving time for discussion to settle, while
> understanding that people may not be working on Spark as their full time
> job and may take holidays. It is believed that by doing this we can limit
> how often people feel the need to exercise their veto.
>
> For the purposes of a -1 on code changes, a qualified voter includes all
> PMC members and committers in the project. For a -1 to be a valid veto it
> must include a technical reason. The reason can include things like the
> change may introduce a maintenance burden or is not the direction of Spark.
>
> If there is a -1 from a non-committer, multiple committers or the PMC
> should be consulted before moving forward.
>
>
> If the original person who cast the veto can not be reached in a
> reasonable time frame given likely holidays, it is up to the PMC to decide
> the next steps within the guidelines of the ASF. This must be decided by a
> consensus vote under the ASF voting rules.
>
> These policies serve to reiterate the core principle that code must not be
> merged with a pending veto or before a consensus has been reached (lazy or
> otherwise).
>
> It is the PMC’s hope that vetoes continue to be infrequent, and when they
> occur all parties take the time to build consensus prior to additional
> feature work.
>
>
> Being a committer means exercising your judgement, while working in a
> community with diverse views. There is nothing wrong in getting a second
> (or 3rd or 4th) opinion when you are uncertain. Thank you for your
> dedication to the Spark project, it is appreciated by the developers and
> users of Spark.
>
>
> It is hoped that these guidelines do not slow down development, rather by
> removing some of the uncertainty that makes it easier for us to reach
> consensus. If you have ideas on how to improve these guidelines, or other
> parts of how the Spark project operates you should reach out on the dev@
> list to start the discussion.
>
>
>
> Kind Regards,
>
> Holden
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


[DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-21 Thread Holden Karau
Hi Spark Developers,

There has been a rather active discussion regarding the specific vetoes
that occured during Spark 3. From that I believe we are now mostly in
agreement that it would be best to clarify our rules around code vetoes &
merging in general. Personally I believe this change is important to help
improve the appearance of a level playing field in the project.

Once discussion settles I'll run this by a copy editor, my grammar isn't
amazing, and bring forward for a vote.

The current Spark committer guide is at
https://spark.apache.org/committers.html. I am proposing we add a section
on when it is OK to merge PRs directly above the section on how to merge
PRs. The text I am proposing to amend our committer guidelines with is:

PRs shall not be merged during active on topic discussion except for issues
like critical security fixes of a public vulnerability. Under extenuating
circumstances PRs may be merged during active off topic discussion and the
discussion directed to a more appropriate venue. Time should be given prior
to merging for those involved with the conversation to explain if they
believe they are on topic.

Lazy consensus requires giving time for discussion to settle, while
understanding that people may not be working on Spark as their full time
job and may take holidays. It is believed that by doing this we can limit
how often people feel the need to exercise their veto.

For the purposes of a -1 on code changes, a qualified voter includes all
PMC members and committers in the project. For a -1 to be a valid veto it
must include a technical reason. The reason can include things like the
change may introduce a maintenance burden or is not the direction of Spark.

If there is a -1 from a non-committer, multiple committers or the PMC
should be consulted before moving forward.


If the original person who cast the veto can not be reached in a reasonable
time frame given likely holidays, it is up to the PMC to decide the next
steps within the guidelines of the ASF. This must be decided by a consensus
vote under the ASF voting rules.

These policies serve to reiterate the core principle that code must not be
merged with a pending veto or before a consensus has been reached (lazy or
otherwise).

It is the PMC’s hope that vetoes continue to be infrequent, and when they
occur all parties take the time to build consensus prior to additional
feature work.


Being a committer means exercising your judgement, while working in a
community with diverse views. There is nothing wrong in getting a second
(or 3rd or 4th) opinion when you are uncertain. Thank you for your
dedication to the Spark project, it is appreciated by the developers and
users of Spark.


It is hoped that these guidelines do not slow down development, rather by
removing some of the uncertainty that makes it easier for us to reach
consensus. If you have ideas on how to improve these guidelines, or other
parts of how the Spark project operates you should reach out on the dev@
list to start the discussion.



Kind Regards,

Holden

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


[no subject]

2020-07-02 Thread vtygoss
Hi devs,


question: how to convert hive output format to spark sql datasource format?   


spark version: spark 2.3.0  
scene:  there are many small files on hdfs(hive) generated  by spark sql 
applications when dynamic partition is enabled or setting 
spark.sql.shuffle.partitions >200.  so i am trying to develop a new feature: 
after temporary files have been written on hdfs but haven’t been moved to final 
path, calculate ideal file number by dfs.blocksize and temporary files’ total 
length, then merge(coalesce/repartition) to ideal file number.  but i meet with 
a difficulty:  temporary files are written in the output format(e.g. 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat) defined at hive TableDesc, i 
can’t load temporary files by 


```
sparkSession
.read.format(TableDesc.getInputFormatClassName)
.load(tempDataPath)
.repartition(ideal file number)
.write.format(TableDesc.getOutputFormatClassName)
```
Throw exception: xxx is not a valid Spark SQL Data Source at 
DataSource#resolveRelation. 
i also tried to use 


```
sparkSession.read
.option("inputFormat",TableDesc.getInputFormatClassName)

.option("outputFormat", TableDesc.getOutputFormatClassName)
.load(tempDataPath)
….
```


it not works and spark sql DataSource defaults to parquet.


So how to convert hive output format to spark sql datasource format?  is there 
any better way than building an map?




Thanks in advance

[no subject]

2020-02-02 Thread Stepan Tuchin
Unsubscribe
-- 

[image: Brandmark_small.jpg]

Stepan Tuchin, Automation Quality Engineer

Grid Dynamics

Vavilova, 38/114, Saratov

Dir: +7 (902) 047-55-55


[no subject]

2020-01-14 Thread @Sanjiv Singh
Regards
Sanjiv Singh
Mob :  +1 571-599-5236


[no subject]

2019-12-20 Thread Driesprong, Fokko
Folks,

I've opened a PR a while ago with a PR to merge the possibility to merge a
custom data type, into a native data type. This is something new because of
the introduction of Delta.

To have some background, I'm having a DataSet that has fields of the type
XMLGregorianCalendarType. I don't care about this type and would like to
convert this to a standard data type. Mainly because, if I'm reading the
data again using another job, it needs to have the customer data type being
registered, which is not possible in the SQL API. The magic bit here is
that I'm overriding the jsonValue to lose the information about the custom
data type. In this case, you have to make sure that it is serialized as the
normal timestamp.

Before Delta, when appending to the table, everything would go fine because
it would not check compatibility on write. Now with Delta, things are
different. When writing, it will check if the two structures can be merged:

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support
was removed in 8.0
Warning: Ignoring non-spark config property:
eventLog.rolloverIntervalSeconds=3600
Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed
to merge fields 'EventTimestamp' and 'EventTimestamp'. Failed to merge
incompatible data types TimestampType and
org.apache.spark.sql.types.CustomXMLGregorianCalendarType@6334178e;;
at
com.databricks.sql.transaction.tahoe.schema.SchemaUtils$$anonfun$18.apply(SchemaUtils.scala:685)
at
com.databricks.sql.transaction.tahoe.schema.SchemaUtils$$anonfun$18.apply(SchemaUtils.scala:674)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at
com.databricks.sql.transaction.tahoe.schema.SchemaUtils$.com$databricks$sql$transaction$tahoe$schema$SchemaUtils$$merge$1(SchemaUtils.scala:674)
at
com.databricks.sql.transaction.tahoe.schema.SchemaUtils$.mergeSchemas(SchemaUtils.scala:750)
at
com.databricks.sql.transaction.tahoe.schema.ImplicitMetadataOperation$class.updateMetadata(ImplicitMetadataOperation.scala:63)
at
com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.updateMetadata(WriteIntoDelta.scala:50)
at
com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:90)
at
com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$anonfun$run$2.apply(CreateDeltaTableCommand.scala:119)
at
com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand$$anonfun$run$2.apply(CreateDeltaTableCommand.scala:93)
at
com.databricks.logging.UsageLogging$$anonfun$recordOperation$1.apply(UsageLogging.scala:405)
at
com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at
com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230)
at
com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:18)
at
com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:272)
at
com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:18)
at
com.databricks.logging.UsageLogging$class.recordOperation(UsageLogging.scala:386)
at
com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:18)
at
com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:55)
at
com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:98)
at
com.databricks.spark.util.UsageLogger$class.recordOperation(UsageLogger.scala:67)
at
com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:67)
at
com.databricks.spark.util.UsageLogging$class.recordOperation(UsageLogger.scala:342)
at
com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.recordOperation(CreateDeltaTableCommand.scala:45)
at
com.databricks.sql.transaction.tahoe.metering.DeltaLogging$class.recordDeltaOperation(DeltaLogging.scala:108)
at
com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.recordDeltaOperation(CreateDeltaTableCommand.scala:45)
at
com.databricks.sql.transaction.tahoe.commands.CreateDeltaTableCommand.run(CreateDeltaTableCommand.scala:93)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at

[no subject]

2019-04-02 Thread Uzi Hadad
unsubscribe


[no subject]

2019-04-02 Thread Daniel Sierra
unsubscribe


[no subject]

2019-03-06 Thread Dongxu Wang



[no subject]

2019-01-03 Thread marco rocchi
Unsubscribe me, please.

Thank you so much


[no subject]

2018-06-23 Thread Anbazhagan Muthuramalingam
Unsubscribe

Regards
M Anbazhagan
IT Analyst



[no subject]

2017-07-28 Thread Hao Chen
--

Hao


[no subject]

2017-01-19 Thread Keith Chapman
Hi ,

Is it possible for an executor (or slave) to know when an actual job ends?
I'm running spark on a cluster (with yarn) and my workers create some
temporary files that I would like to clean up once the job ends. Is there a
way for the worker to detect that a job has finished? I tried doing it in
the JobProgressListener but it does not seem to work in a cluster. The
event is not triggered in the worker.

Regards,
Keith.

http://keith-chapman.com


[no subject]

2016-12-20 Thread satyajit vegesna
Hi All,

PFB sample code ,

val df = spark.read.parquet()
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd


def comp(zipcode:String):Unit={

val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode)
val data = spark.sql(zipval) //Throwing null pointer exception with RDD
data.write.parquet(..)

}

val sam = zip.map(x => comp(x))
sam.count

But when i do val zip =
df.select("zip_code").distinct().as[String].rdd.collect and call the
function, then i get data computer, but in sequential order.

I would like to know, why when tried running map with rdd, i get null
pointer exception and is there a way to compute the comp function for each
zipcode in parallel ie run multiple zipcode at the same time.

Any clue or inputs are appreciated.

Regards.


[no subject]

2016-11-24 Thread Rostyslav Sotnychenko


[no subject]

2016-10-10 Thread Fei Hu
Hi All,

I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version
1.5.0). I customized the Spark interpreter to use
org.apache.spark.serializer.KryoSerializer as spark.serializer. And in the
dependency I added Kyro-3.0.3 as following:
 com.esotericsoftware:kryo:3.0.3


When I wrote the scala notebook and run the program, I got the following
errors. But If I compiled these code as jars, and use spark-submit to run
it on the cluster, it worked well without errors.

WARN [2016-10-10 23:43:40,801] ({task-result-getter-1}
Logging.scala[logWarning]:71) - Lost task 0.0 in stage 3.0 (TID 9,
svr-A3-A-U20): java.io.EOFException

at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:196)

at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:217)

at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)

at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1175)

at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)

at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)

at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)

at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)

at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)

at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

at org.apache.spark.scheduler.Task.run(Task.scala:88)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


There were also some errors when I run the Zeppelin Tutorial:

Caused by: java.io.IOException: java.lang.NullPointerException

at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)

at
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)

at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)

at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)

at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)

at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)

at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)

at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)

... 3 more

Caused by: java.lang.NullPointerException

at
com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:38)

at
com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23)

at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)

at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)

at
org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1$$anonfun$apply$mcV$sp$2.apply(ParallelCollectionRDD.scala:80)

at
org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1$$anonfun$apply$mcV$sp$2.apply(ParallelCollectionRDD.scala:80)

at
org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:142)

at
org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:80)

at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160)

Is there anyone knowing why it happended?

Thanks in advance,
Fei


[no subject]

2016-07-26 Thread thibaut
unsuscribe

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[no subject]

2016-05-22 Thread ????
I would like to contribute to spark. I am working on spark-15429. Please give 
permission to contribute.

[no subject]

2015-12-01 Thread Alexander Pivovarov



[no subject]

2015-11-26 Thread Dmitry Tolpeko



[no subject]

2015-08-05 Thread Sandeep Giri
Yes, but in the take() approach we will be bringing the data to the driver
and is no longer distributed.

Also, the take() takes only count as argument which means that every time
we would transferring the redundant elements.


Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)

www.KnowBigData.com. http://KnowBigData.com.
Phone: +1-253-397-1945 (Office)

[image: linkedin icon] https://linkedin.com/company/knowbigdata [image:
other site icon] http://knowbigdata.com  [image: facebook icon]
https://facebook.com/knowbigdata [image: twitter icon]
https://twitter.com/IKnowBigData https://twitter.com/IKnowBigData


On Wed, Aug 5, 2015 at 3:09 PM, Sean Owen so...@cloudera.com wrote:

 I don't think countApprox is appropriate here unless approximation is OK.
 But more generally, counting everything matching a filter requires applying
 the filter to the whole data set, which seems like the thing to be avoided
 here.

 The take approach is better since it would stop after finding n matching
 elements (it might do a little extra work given partitioning and
 buffering). It would not filter the whole data set.

 The only downside there is that it would copy n elements to the driver.


 On Wed, Aug 5, 2015 at 10:34 AM, Sandeep Giri sand...@knowbigdata.com
 wrote:

 Hi Jonathan,

 Does that guarantee a result? I do not see that it is really optimized.

 Hi Carsten,


 How does the following code work:

 data.filter(qualifying_function).take(n).count() = n


 Also, as per my understanding, in both the approaches you mentioned the
 qualifying function will be executed on whole dataset even if the value was
 already found in the first element of RDD:


- data.filter(qualifying_function).take(n).count() = n
   - val contains1MatchingElement = !(data.filter(qualifying_
   function).isEmpty())

 Isn't it? Am I missing something?


 Regards,
 Sandeep Giri,
 +1 347 781 4573 (US)
 +91-953-899-8962 (IN)

 www.KnowBigData.com. http://KnowBigData.com.
 Phone: +1-253-397-1945 (Office)

 [image: linkedin icon] https://linkedin.com/company/knowbigdata [image:
 other site icon] http://knowbigdata.com  [image: facebook icon]
 https://facebook.com/knowbigdata [image: twitter icon]
 https://twitter.com/IKnowBigData https://twitter.com/IKnowBigData


 On Fri, Jul 31, 2015 at 3:37 PM, Jonathan Winandy 
 jonathan.wina...@gmail.com wrote:

 Hello !

 You could try something like that :

 def exists[T](rdd:RDD[T])(f:T=Boolean, n:Int):Boolean = {
   rdd.filter(f).countApprox(timeout = 1).getFinalValue().low  n
 }

 If would work for large datasets and large value of n.

 Have a nice day,

 Jonathan



 On 31 July 2015 at 11:29, Carsten Schnober 
 schno...@ukp.informatik.tu-darmstadt.de wrote:

 Hi,
 the RDD class does not have an exist()-method (in the Scala API), but
 the functionality you need seems easy to resemble with the existing
 methods:

 val containsNMatchingElements =
 data.filter(qualifying_function).take(n).count() = n

 Note: I am not sure whether the intermediate take(n) really increases
 performance, but the idea is to arbitrarily reduce the number of
 elements in the RDD before counting because we are not interested in the
 full count.

 If you need to check specifically whether there is at least one matching
 occurrence, it is probably preferable to use isEmpty() instead of
 count() and check whether the result is false:

 val contains1MatchingElement =
 !(data.filter(qualifying_function).isEmpty())

 Best,
 Carsten



 Am 31.07.2015 um 11:11 schrieb Sandeep Giri:
  Dear Spark Dev Community,
 
  I am wondering if there is already a function to solve my problem. If
  not, then should I work on this?
 
  Say you just want to check if a word exists in a huge text file. I
 could
  not find better ways than those mentioned here
  
 http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2#q6
 .
 
  So, I was proposing if we have a function called /exists /in RDD with
  the following signature:
 
  #returns the true if n elements exist which qualify our criteria.
  #qualifying function would receive the element and its index and
 return
  true or false.
  def /exists/(qualifying_function, n):
   
 
 
  Regards,
  Sandeep Giri,
  +1 347 781 4573 (US)
  +91-953-899-8962 (IN)
 
  www.KnowBigData.com. http://KnowBigData.com.
  Phone: +1-253-397-1945 (Office)
 
  linkedin icon https://linkedin.com/company/knowbigdata other site
 icon
  http://knowbigdata.com facebook icon
  https://facebook.com/knowbigdatatwitter icon
  https://twitter.com/IKnowBigDatahttps://twitter.com/IKnowBigData
 

 --
 Carsten Schnober
 Doctoral Researcher
 Ubiquitous Knowledge Processing (UKP) Lab
 FB 20 / Computer Science Department
 Technische Universität Darmstadt
 Hochschulstr. 10, D-64289 Darmstadt, Germany
 phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
 schno...@ukp.informatik.tu-darmstadt.de
 www.ukp.tu-darmstadt.de

 Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
 GRK 1994: