date:20170626

Re: SQLListener concurrency bug?

2017-06-26 Thread Shixiong(Ryan) Zhu

Right now they are safe because the caller also calls synchronized when
using them. This is to avoid copying objects. It's probably a bad design.
If you want to refactor them, PR is welcome.

On Mon, Jun 26, 2017 at 2:27 AM, Oleksandr Vayda 
wrote:

> Hi all,
>
> Reading the source code of the org.apache.spark.sql.execution.ui.
> SQLListener, specifically this place - https://github.com/apache/
> spark/blob/master/sql/core/src/main/scala/org/apache/
> spark/sql/execution/ui/SQLListener.scala#L328
>
> def getFailedExecutions: Seq[SQLExecutionUIData] = synchronized {
> failedExecutions
> }
> def getCompletedExecutions: Seq[SQLExecutionUIData] = synchronized {
> completedExecutions
> }
> I believe the synchronized block is used here incorrectly. If I get it
> right the main purpose here is to synchronize access to the mutable
> collections from the UI (read) and the event bus (read/write) threads. But
> in the current implementation the "synchronized" blocks return bare
> references to mutable collections and in fact nothing gets synchronized.
> Is it a bug?
>
> Sincerely yours,
> Oleksandr Vayda
>
> mobile: +420 604 113 056 <+420%20604%20113%20056>
>

Re: how to mention others in JIRA comment please?

2017-06-26 Thread ??????????

thank you?9?9

---Original---
From: "Ted Yu"
Date: 2017/6/27 10:18:18
To: "??"<1427357...@qq.com>;
Cc: "user";"dev";
Subject: Re: how to mention others in JIRA comment please?

You can find the JIRA handle of the person you want to mention by going to a 
JIRA where that person has commented.

e.g. you want to find the handle for Joseph.
You can go to:
https://issues.apache.org/jira/browse/SPARK-6635

and click on his name in comment:
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=josephkb

The following constitutes a mention for him:
[~josephkb]

FYI

On Mon, Jun 26, 2017 at 6:56 PM, ?? <1427357...@qq.com> wrote:
Hi all,

how to mention others in JIRA comment please?

I added @ before other members' name, but it didn't work.

Would you like help me please?

thanks
Fei Shao

Re: how to mention others in JIRA comment please?

2017-06-26 Thread Ted Yu

You can find the JIRA handle of the person you want to mention by going to
a JIRA where that person has commented.

e.g. you want to find the handle for Joseph.
You can go to:
https://issues.apache.org/jira/browse/SPARK-6635

and click on his name in comment:
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=josephkb

The following constitutes a mention for him:
[~josephkb]

FYI

On Mon, Jun 26, 2017 at 6:56 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote:

> Hi all,
>
> how to mention others in JIRA comment please?
> I added @ before other members' name, but it didn't work.
>
> Would you like help me please?
>
> thanks
> Fei Shao
>

how to mention others in JIRA comment please?

2017-06-26 Thread ??????????

Hi all,


how to mention others in JIRA comment please?

I added @ before other members' name, but it didn't work.


Would you like help me please?


thanks
Fei Shao

Re: issue about the windows slice of stream

2017-06-26 Thread ??????????

Hi  Owen,


Would you like help me check this issue please?
Is it a potential bug please or not?


thanks
Fei Shao




 
---Original---
From: "??"<1427357...@qq.com>
Date: 2017/6/25 21:44:41
To: "user";"dev";
Subject: Re: issue about the windows slice of stream


Hi all,


Let me add more info about this.
The log showed:
17/06/25 17:31:26 DEBUG ReducedWindowedDStream: Time 1498383086000 ms is valid
17/06/25 17:31:26 DEBUG ReducedWindowedDStream: Window time = 2000 ms
17/06/25 17:31:26 DEBUG ReducedWindowedDStream: Slide time = 8000 ms
17/06/25 17:31:26 DEBUG ReducedWindowedDStream: Zero time = 1498383078000 ms
17/06/25 17:31:26 DEBUG ReducedWindowedDStream: Current window = [1498383085000 
ms, 1498383086000 ms]
17/06/25 17:31:26 DEBUG ReducedWindowedDStream: Previous window = 
[1498383077000 ms, 1498383078000 ms]
17/06/25 17:31:26 INFO ShuffledDStream: Slicing from 1498383077000 ms to 
1498383084000 ms (aligned to 1498383077000 ms and 1498383084000 ms)
17/06/25 17:31:26 INFO ShuffledDStream: Time 1498383078000 ms is invalid as 
zeroTime is 1498383078000 ms , slideDuration is 1000 ms and difference is 0 ms
17/06/25 17:31:26 DEBUG ShuffledDStream: Time 1498383079000 ms is valid
17/06/25 17:31:26 DEBUG MappedDStream: Time 1498383079000 ms is valid
the slice time is wrong.


For my test code:
lines.countByValueAndWindow( Seconds(2), Seconds(8)).foreachRDD( s => { ??=== 
here the windowDuration is 2 seconds and the slideDuration is 8 seconds.
===key  log begin 
17/06/25 17:31:26 DEBUG ReducedWindowedDStream: Current window = [1498383085000 
ms, 1498383086000 ms]
17/06/25 17:31:26 DEBUG ReducedWindowedDStream: Previous window = 
[1498383077000 ms, 1498383078000 ms]
17/06/25 17:31:26 INFO ShuffledDStream: Slicing from 1498383077000 ms to 
1498383084000 ms (aligned to 1498383077000 ms and 1498383084000 ms) ??=== here, 
the old RDD slices from 1498383077000 to 1498383084000 . It is 8 seconds. 
Actual it should be 2 seconds.
===key log end
===code in ReducedWindowedDStream.scala begin
override def compute(validTime: Time): Option[RDD[(K, V)]] = {
val reduceF = reduceFunc
val invReduceF = invReduceFunc
val currentTime = validTime
val currentWindow = new Interval(currentTime - windowDuration + 
parent.slideDuration,
currentTime)
val previousWindow = currentWindow - slideDuration
logDebug("Window time = " + windowDuration)
logDebug("Slide time = " + slideDuration)
logDebug("Zero time = " + zeroTime)
logDebug("Current window = " + currentWindow)
logDebug("Previous window = " + previousWindow)
// _
// | previous window |__
// |___| current window | --> Time
// |_|
//
// | | |___ _|
// | |
// V V
// old RDDs new RDDs
//
// Get the RDDs of the reduced values in "old time steps"
val oldRDDs =
reducedStream.slice(previousWindow.beginTime, currentWindow.beginTime - 
parent.slideDuration) ??== I think this line is 
"reducedStream.slice(previousWindow.beginTime, currentWindow.beginTime + 
windowDuration - parent.slideDuration)"
logDebug("# old RDDs = " + oldRDDs.size)
// Get the RDDs of the reduced values in "new time steps"
val newRDDs =
reducedStream.slice(previousWindow.endTime + parent.slideDuration, 
currentWindow.endTime)??== this line is 
"reducedStream.slice(previousWindow.endTime + windowDuration - 
parent.slideDuration,
currentWindow.endTime)"
logDebug("# new RDDs = " + newRDDs.size)
===code in ReducedWindowedDStream.scala end



Thanks
Fei Shao
 
---Original---
From: "??"<1427357...@qq.com>
Date: 2017/6/24 14:51:52
To: "user";"dev";
Subject: issue about the windows slice of stream


Hi all,
I found an issue about the windows slice of dstream.
My code is :


ssc = new StreamingContext( conf, Seconds(1))


val content = ssc.socketTextStream('ip','port')
content.countByValueAndWindow( Seconds(2),  Seconds(8)).foreach( println())
The key is that slide is greater than windows.
I checked the output.The result from  foreach( println()) was wrong.
I found the stream was cut apart wrong.
Can I open a JIRA please?


thanks
Fei Shao

Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-26 Thread Shixiong(Ryan) Zhu

Hey Assaf,

You need to "v2.2.0" to "v2.2.0-rc5" in GitHub links because there is no
v2.2.0 right now.

On Mon, Jun 26, 2017 at 12:57 AM, assaf.mendelson 
wrote:

> Not a show stopper, however, I was looking at the structured streaming
> programming guide and under arbitrary stateful operations (
> https://people.apache.org/~pwendell/spark-releases/spark-
> 2.2.0-rc5-docs/structured-streaming-programming-guide.
> html#arbitrary-stateful-operations) the suggestion is to take a look at
> the examples (Scala
> 
> /Java
> 
> ). These link to an non existing file (called StructuredSessionization or
> JavaStructuredSessionization, I couldn’t find either of these files in the
> repository).
>
> If the example file exists, I think it would be nice to add it, otherwise
> I would suggest simply removing the examples link from the programming
> guide (there are examples inside the group state API
> https://people.apache.org/~pwendell/spark-releases/spark-
> 2.2.0-rc5-docs/api/scala/index.html#org.apache.spark.
> sql.streaming.GroupState).
>
>
>
> Thanks,
>
>   Assaf.
>
>
>
> *From:* Michael Armbrust [via Apache Spark Developers List] [mailto:ml+[hidden
> email] ]
> *Sent:* Wednesday, June 21, 2017 2:50 AM
> *To:* Mendelson, Assaf
> *Subject:* [VOTE] Apache Spark 2.2.0 (RC5)
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.2.0. The vote is open until Friday, June 23rd, 2017 at 18:00 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache Spark 2.2.0
>
> [ ] -1 Do not release this package because ...
>
>
>
>
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
>
>
> The tag to be voted on is v2.2.0-rc5
>  (62e442e73a2fa66
> 3892d2edaff5f7d72d7f402ed)
>
>
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc5-bin/
>
>
>
> Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/pwendell.asc
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1243/
>
>
>
> The documentation corresponding to this release can be found at:
>
> https://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc5-docs/
>
>
>
>
>
> *FAQ*
>
>
>
> *How can I help test this release?*
>
>
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
>
>
> *What should happen to JIRA tickets still targeting 2.2.0?*
>
>
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>
>
>
> *But my bug isn't fixed!??!*
>
>
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.1.
>
>
> --
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
> http://apache-spark-developers-list.1001551.n3.
> nabble.com/VOTE-Apache-Spark-2-2-0-RC5-tp21815.html
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] 
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> 
>
> --
> View this message in context: RE: [VOTE] Apache Spark 2.2.0 (RC5)
> 
> Sent from the Apache Spark Developers List mailing list archive
>  at
> Nabble.com.
>

Re: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-26 Thread Michael Armbrust

Okay, this vote fails.  Following with RC6 shortly.

On Wed, Jun 21, 2017 at 12:51 PM, Imran Rashid  wrote:

> -1
>
> I'm sorry for discovering this so late, but I just filed
> https://issues.apache.org/jira/browse/SPARK-21165 which I think should be
> a blocker, its a regression from 2.1
>
> On Wed, Jun 21, 2017 at 1:43 PM, Nick Pentreath 
> wrote:
>
>> As before, release looks good, all Scala, Python tests pass. R tests fail
>> with same issue in SPARK-21093 but it's not a blocker.
>>
>> +1 (binding)
>>
>>
>> On Wed, 21 Jun 2017 at 01:49 Michael Armbrust 
>> wrote:
>>
>>> I will kick off the voting with a +1.
>>>
>>> On Tue, Jun 20, 2017 at 4:49 PM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.2.0. The vote is open until Friday, June 23rd, 2017 at 18:00
 PST and passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.2.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see https://spark.apache.org/

 The tag to be voted on is v2.2.0-rc5
  (62e442e73a2fa66
 3892d2edaff5f7d72d7f402ed)

 List of JIRA tickets resolved can be found with this filter

 .

 The release files, including signatures, digests, etc. can be found at:
 https://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc5-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1243/

 The documentation corresponding to this release can be found at:
 https://people.apache.org/~pwendell/spark-releases/spark-2.
 2.0-rc5-docs/

 *FAQ*

 *How can I help test this release?*

 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 *What should happen to JIRA tickets still targeting 2.2.0?*

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should be
 worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

 *But my bug isn't fixed!??!*

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from 2.1.1.

>>>
>>>
>

Re: Is there something wrong with jenkins?

2017-06-26 Thread Sean Owen

The Arrow change broke the build:
https://github.com/apache/spark/pull/15821#issuecomment-310894657

Do we need to revert this? I don't want to but it's also blocking testing.

On Mon, Jun 26, 2017 at 12:19 PM Yuming Wang  wrote:

> Hi All,
>
> Is there something wrong with jenkins?
>
>
> # To activate this environment, use:
> # $ source activate /tmp/tmp.tWAUGnH6wZ/3.5
> #
> # To deactivate this environment, use:
> # $ source deactivate
> #
> discarding /home/anaconda/bin from PATH
> prepending /tmp/tmp.tWAUGnH6wZ/3.5/bin to PATH
> Fetching package metadata: ..SSL verification error: hostname 
> 'conda.binstar.org' doesn't match either of 'anaconda.com', 
> 'anacondacloud.com', 'anacondacloud.org', 'binstar.org', 'wakari.io'
> .SSL verification error: hostname 'conda.binstar.org' doesn't match either of 
> 'anaconda.com', 'anacondacloud.com', 'anacondacloud.org', 'binstar.org', 
> 'wakari.io'
> ...
> Solving package specifications: .
> Error:  Package missing in current linux-64 channels:
>   - pyarrow 0.4|0.4.0*
>
> You can search for this package on anaconda.org with
>
> anaconda search -t conda pyarrow 0.4|0.4.0*
>
> You may need to install the anaconda-client command line client with
>
> conda install anaconda-client
> Cleaning up temporary directory - /tmp/tmp.tWAUGnH6wZ
> [error] running 
> /home/jenkins/workspace/SparkPullRequestBuilder/dev/run-pip-tests ; received 
> return code 1
> Attempting to post to Github...
>  > Post successful.
> Build step 'Execute shell' marked build as failure
> Archiving artifacts
> Recording test results
> Test FAILed.
> Refer to this link for build results (access rights to CI server needed): 
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78627/
> Test FAILed.
> Finished: FAILURE
>
>
> more logs: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78627/console
>
>
>
> Thanks!
>
>

Is there something wrong with jenkins?

2017-06-26 Thread Yuming Wang

Hi All,

Is there something wrong with jenkins?


# To activate this environment, use:
# $ source activate /tmp/tmp.tWAUGnH6wZ/3.5
#
# To deactivate this environment, use:
# $ source deactivate
#
discarding /home/anaconda/bin from PATH
prepending /tmp/tmp.tWAUGnH6wZ/3.5/bin to PATH
Fetching package metadata: ..SSL verification error: hostname
'conda.binstar.org' doesn't match either of 'anaconda.com',
'anacondacloud.com', 'anacondacloud.org', 'binstar.org', 'wakari.io'
.SSL verification error: hostname 'conda.binstar.org' doesn't match
either of 'anaconda.com', 'anacondacloud.com', 'anacondacloud.org',
'binstar.org', 'wakari.io'
...
Solving package specifications: .
Error:  Package missing in current linux-64 channels:
  - pyarrow 0.4|0.4.0*

You can search for this package on anaconda.org with

anaconda search -t conda pyarrow 0.4|0.4.0*

You may need to install the anaconda-client command line client with

conda install anaconda-client
Cleaning up temporary directory - /tmp/tmp.tWAUGnH6wZ
[error] running
/home/jenkins/workspace/SparkPullRequestBuilder/dev/run-pip-tests ;
received return code 1
Attempting to post to Github...
 > Post successful.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Test FAILed.
Refer to this link for build results (access rights to CI server
needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78627/
Test FAILed.
Finished: FAILURE


more logs: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78627/console



Thanks!

Re: Question on Spark code

2017-06-26 Thread Steve Loughran


On 25 Jun 2017, at 20:57, kant kodali 
> wrote:

impressive! I need to learn more about scala.

What I mean stripping away conditional check in Java is this.

static final boolean isLogInfoEnabled = false;

public void logMessage(String message) {
if(isLogInfoEnabled) {
log.info(message)
}
}

If you look at the byte code the dead if check will be removed.





Generally it's skipped in Java too now people move to SLF4J APIs, which does 
on-demand string expansion

LOG.info("network IO failure from {}  source to {}", src, 
dest, ex). That only builds the final string callis src.toString() and 
dest.toString() when needed; handling null values too. So you can skip those 
guards everywhere. But the string template is still constructed; it's not free, 
and there's some merit in maintaining the guard @ debug level, though I don't 
personally bother.

The spark one takes a closure, so it can do much more. However, you shouldn't 
do anything with side effects, or indeed, anything prone to throwing 
exceptions. Always try to write .toString() methods which are robust against 
null values, that is: valid for the entire life of an instance. Your debuggers 
will appreciate it too.

SQLListener concurrency bug?

2017-06-26 Thread Oleksandr Vayda

Hi all,

Reading the source code of the org.apache.spark.sql.execution.ui.SQLListener,
specifically this place -
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L328

def getFailedExecutions: Seq[SQLExecutionUIData] = synchronized {
failedExecutions
}
def getCompletedExecutions: Seq[SQLExecutionUIData] = synchronized {
completedExecutions
}
I believe the synchronized block is used here incorrectly. If I get it
right the main purpose here is to synchronize access to the mutable
collections from the UI (read) and the event bus (read/write) threads. But
in the current implementation the "synchronized" blocks return bare
references to mutable collections and in fact nothing gets synchronized.
Is it a bug?

Sincerely yours,
Oleksandr Vayda

mobile: +420 604 113 056

RE: [VOTE] Apache Spark 2.2.0 (RC5)

2017-06-26 Thread assaf.mendelson

Not a show stopper, however, I was looking at the structured streaming 
programming guide and under arbitrary stateful operations 
(https://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc5-docs/structured-streaming-programming-guide.html#arbitrary-stateful-operations)
 the suggestion is to take a look at the examples 
(Scala/Java).
 These link to an non existing file (called StructuredSessionization or 
JavaStructuredSessionization, I couldn’t find either of these files in the 
repository).
If the example file exists, I think it would be nice to add it, otherwise I 
would suggest simply removing the examples link from the programming guide 
(there are examples inside the group state API 
https://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc5-docs/api/scala/index.html#org.apache.spark.sql.streaming.GroupState).

Thanks,
  Assaf.

From: Michael Armbrust [via Apache Spark Developers List] 
[mailto:ml+s1001551n21815...@n3.nabble.com]
Sent: Wednesday, June 21, 2017 2:50 AM
To: Mendelson, Assaf
Subject: [VOTE] Apache Spark 2.2.0 (RC5)

Please vote on releasing the following candidate as Apache Spark version 2.2.0. 
The vote is open until Friday, June 23rd, 2017 at 18:00 PST and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is 
v2.2.0-rc5 
(62e442e73a2fa663892d2edaff5f7d72d7f402ed)

List of JIRA tickets resolved can be found with this 
filter.

The release files, including signatures, digests, etc. can be found at:
https://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1243/

The documentation corresponding to this release can be found at:
https://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc5-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release unless 
the bug in question is a regression from 2.1.1.


If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Apache-Spark-2-2-0-RC5-tp21815.html
To start a new topic under Apache Spark Developers List, email 
ml+s1001551n1...@n3.nabble.com
To unsubscribe from Apache Spark Developers List, click 
here.
NAML




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/RE-VOTE-Apache-Spark-2-2-0-RC5-tp21863.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: SQLListener concurrency bug?

Re: how to mention others in JIRA comment please?

Re: how to mention others in JIRA comment please?

how to mention others in JIRA comment please?

Re: issue about the windows slice of stream

Re: [VOTE] Apache Spark 2.2.0 (RC5)

Re: [VOTE] Apache Spark 2.2.0 (RC5)

Re: Is there something wrong with jenkins?

Is there something wrong with jenkins?

Re: Question on Spark code

SQLListener concurrency bug?

RE: [VOTE] Apache Spark 2.2.0 (RC5)

12 matches

Site Navigation

Mail list logo

Footer information