[kwin] [Bug 484323] High CPU load of kwin_x11 when locking the screen

2024-04-11 Thread Marcelo Vanzin
https://bugs.kde.org/show_bug.cgi?id=484323

--- Comment #15 from Marcelo Vanzin  ---
Today's Neon package updates seem to have fixed the issue for me. It still
reports the same Plasma version as before (6.0.3), but I think the Qt library
was updated.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kwin] [Bug 484323] High CPU load of kwin_x11 when locking the screen

2024-04-10 Thread Marcelo Vanzin
https://bugs.kde.org/show_bug.cgi?id=484323

--- Comment #13 from Marcelo Vanzin  ---
> Can you check if the problem also happens when the external monitor is not 
> plugged in?

I happens for me without the external monitor being plugged in, but only on my
laptop (Intel graphics card). On my desktop (NVIDIA card) it does not seem to
happen. Let me know if there's any more info I can collect.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kwin] [Bug 485000] Context menus on right click not working properly with custom Aurorae themes; aside from default Breeze theme

2024-04-03 Thread Marcelo Vanzin
https://bugs.kde.org/show_bug.cgi?id=485000

Marcelo Vanzin  changed:

   What|Removed |Added

 CC||mmvgro...@gmail.com

--- Comment #3 from Marcelo Vanzin  ---
I can confirm this happens with non-Breeze themes. I was using Plastik (don't
know if that falls under "Aurorae themes"). I have a small video showing the
issue if that's helpful, but the already attached image shows the same thing.

-- 
You are receiving this mail because:
You are watching all bug changes.

[kwin] [Bug 484323] High CPU load of kwin_x11 when locking the screen

2024-04-03 Thread Marcelo Vanzin
https://bugs.kde.org/show_bug.cgi?id=484323

Marcelo Vanzin  changed:

   What|Removed |Added

 CC||mmvgro...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[Kernel-packages] [Bug 2057484] Re: kernel worker event freezes during traffic on iwlwifi

2024-03-18 Thread Marcelo Vanzin
I'm not sure the problem I'm having is exactly the same, but I also
can't get any kernel after 6.5.0-21 to properly boot up on my laptop
with an Intel wifi card.

I'm attaching the portion of kern.log where a bunch of kernel worker
tasks are reported as timed out; that happened, IIRC, after I tried to
shut down the system by hitting the power button. The system wasn't
really usable since most processes weren't able to start, so I couldn't
really do a lot of investigation. Even trying to do "sudo blah" or just
"ifconfig" would hang indefinitely.

If it's any help, I have a bunch of options set for the Intel wifi driver:
options iwlwifi power_save=0
options iwlmvm power_scheme=1
options iwldvm power_scheme=1
options lwlwifi bt_coex_active=enable


** Attachment added: "Kernel worker timeout reports from kern.log"
   
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-6.5/+bug/2057484/+attachment/5756935/+files/kern.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-hwe-6.5 in Ubuntu.
https://bugs.launchpad.net/bugs/2057484

Title:
  kernel worker event freezes during traffic  on iwlwifi

Status in linux-signed-hwe-6.5 package in Ubuntu:
  Confirmed

Bug description:
  With traffic on Intel Wifi 7 BE200 card, For a while kernel enter 100%
  cpu usage on a worker event thread and kworker/0:0-events_freezable is
  in D state.  System highly unresponsive during this time.

  Happens in 6.5.0-25, does not happen in 6.5.0-17

  
  Description:Ubuntu 22.04.4 LTS
  Release:22.04

Installed: 6.5.0-25.25~22.04.1
Candidate: 6.5.0-25.25~22.04.1
Version table:
   *** 6.5.0-25.25~22.04.1 500
  500 http://us.archive.ubuntu.com/ubuntu jammy-updates/main amd64 
Packages
  500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 
Packages
  100 /var/lib/dpkg/status

  linux-modules-6.5.0-25-generic:
Installed: 6.5.0-25.25~22.04.1
Candidate: 6.5.0-25.25~22.04.1
Version table:
   *** 6.5.0-25.25~22.04.1 500
  500 http://us.archive.ubuntu.com/ubuntu jammy-updates/main amd64 
Packages
  500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 
Packages
  100 /var/lib/dpkg/status

  filename:   
/lib/modules/6.5.0-25-generic/kernel/drivers/net/wireless/intel/iwlwifi/iwlwifi.ko
  license:GPL
  description:Intel(R) Wireless WiFi driver for Linux
  firmware:   iwlwifi-100-5.ucode
  firmware:   iwlwifi-1000-5.ucode
  firmware:   iwlwifi-135-6.ucode
  firmware:   iwlwifi-105-6.ucode
  firmware:   iwlwifi-2030-6.ucode
  firmware:   iwlwifi-2000-6.ucode
  firmware:   iwlwifi-5150-2.ucode
  firmware:   iwlwifi-5000-5.ucode
  firmware:   iwlwifi-6000g2b-6.ucode
  firmware:   iwlwifi-6000g2a-6.ucode
  firmware:   iwlwifi-6050-5.ucode
  firmware:   iwlwifi-6000-6.ucode
  firmware:   iwlwifi-7265D-29.ucode
  firmware:   iwlwifi-7265-17.ucode
  firmware:   iwlwifi-3168-29.ucode
  firmware:   iwlwifi-3160-17.ucode
  firmware:   iwlwifi-7260-17.ucode
  firmware:   iwlwifi-8265-36.ucode
  firmware:   iwlwifi-8000C-36.ucode
  firmware:   iwlwifi-9260-th-b0-jf-b0-46.ucode
  firmware:   iwlwifi-9000-pu-b0-jf-b0-46.ucode
  firmware:   iwlwifi-cc-a0-77.ucode
  firmware:   iwlwifi-QuZ-a0-jf-b0-77.ucode
  firmware:   iwlwifi-QuZ-a0-hr-b0-77.ucode
  firmware:   iwlwifi-Qu-b0-jf-b0-77.ucode
  firmware:   iwlwifi-Qu-c0-hr-b0-77.ucode
  firmware:   iwlwifi-Qu-b0-hr-b0-77.ucode
  firmware:   iwlwifi-ma-b0-mr-a0-83.ucode
  firmware:   iwlwifi-ma-b0-gf4-a0-83.ucode
  firmware:   iwlwifi-ma-b0-gf-a0-83.ucode
  firmware:   iwlwifi-ma-b0-hr-b0-83.ucode
  firmware:   iwlwifi-ma-a0-mr-a0-83.ucode
  firmware:   iwlwifi-ma-a0-gf4-a0-83.ucode
  firmware:   iwlwifi-ma-a0-gf-a0-83.ucode
  firmware:   iwlwifi-ma-a0-hr-b0-83.ucode
  firmware:   iwlwifi-ty-a0-gf-a0-83.ucode
  firmware:   iwlwifi-so-a0-gf-a0-83.ucode
  firmware:   iwlwifi-so-a0-hr-b0-83.ucode
  firmware:   iwlwifi-so-a0-jf-b0-83.ucode
  firmware:   iwlwifi-gl-c0-fm-c0-83.ucode
  firmware:   iwlwifi-gl-b0-fm-b0-83.ucode
  firmware:   iwlwifi-bz-a0-fm4-b0-83.ucode
  firmware:   iwlwifi-bz-a0-fm-c0-83.ucode
  firmware:   iwlwifi-bz-a0-fm-b0-83.ucode
  firmware:   iwlwifi-bz-a0-gf4-a0-83.ucode
  firmware:   iwlwifi-bz-a0-gf-a0-83.ucode
  firmware:   iwlwifi-bz-a0-hr-b0-83.ucode
  firmware:   iwlwifi-sc-a0-wh-a0-83.ucode
  firmware:   iwlwifi-sc-a0-gf4-a0-83.ucode
  firmware:   iwlwifi-sc-a0-gf-a0-83.ucode
  firmware:   iwlwifi-sc-a0-hr-b0-83.ucode
  firmware:   iwlwifi-sc-a0-hr-b0-83.ucode
  firmware:   iwlwifi-sc-a0-fm-c0-83.ucode
  firmware:   iwlwifi-sc-a0-fm-b0-83.ucode
  srcversion: CE34BB7E0E287E10FEC1E13
  alias:  pci:v8086dE440sv*sd*bc*sc*i*
  alias:  

[Bug 2057484] Re: kernel worker event freezes during traffic on iwlwifi

2024-03-18 Thread Marcelo Vanzin
I'm not sure the problem I'm having is exactly the same, but I also
can't get any kernel after 6.5.0-21 to properly boot up on my laptop
with an Intel wifi card.

I'm attaching the portion of kern.log where a bunch of kernel worker
tasks are reported as timed out; that happened, IIRC, after I tried to
shut down the system by hitting the power button. The system wasn't
really usable since most processes weren't able to start, so I couldn't
really do a lot of investigation. Even trying to do "sudo blah" or just
"ifconfig" would hang indefinitely.

If it's any help, I have a bunch of options set for the Intel wifi driver:
options iwlwifi power_save=0
options iwlmvm power_scheme=1
options iwldvm power_scheme=1
options lwlwifi bt_coex_active=enable


** Attachment added: "Kernel worker timeout reports from kern.log"
   
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-6.5/+bug/2057484/+attachment/5756935/+files/kern.log

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2057484

Title:
  kernel worker event freezes during traffic  on iwlwifi

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe-6.5/+bug/2057484/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[konsole] [Bug 483951] konsole crashes when text selection contains emoji

2024-03-18 Thread Marcelo Vanzin
https://bugs.kde.org/show_bug.cgi?id=483951

--- Comment #1 from Marcelo Vanzin  ---
Oops forgot to fill in some info in the template:
Operating System: KDE neon 6.0
KDE Plasma Version: 6.0.2
KDE Frameworks Version: 6.0.0
Qt Version: 6.6.2
Kernel Version: 6.5.0-25-generic (64-bit)
Graphics Platform: X11

This also happens on Wayland.

-- 
You are receiving this mail because:
You are watching all bug changes.

[konsole] [Bug 483951] New: konsole crashes when text selection contains emoji

2024-03-18 Thread Marcelo Vanzin
https://bugs.kde.org/show_bug.cgi?id=483951

Bug ID: 483951
   Summary: konsole crashes when text selection contains emoji
Classification: Applications
   Product: konsole
   Version: 24.02.0
  Platform: Neon
OS: Linux
Status: REPORTED
  Severity: crash
  Priority: NOR
 Component: copy-paste
  Assignee: konsole-de...@kde.org
  Reporter: mmvgro...@gmail.com
  Target Milestone: ---

Created attachment 167440
  --> https://bugs.kde.org/attachment.cgi?id=167440=edit
Crash report

SUMMARY

STEPS TO REPRODUCE
1. make sure there's some output in console containing emojis (e.g. ❤️)
2. try to select the text with the emojis in it
3. there is no step 3

OBSERVED RESULT
konsole crashes

EXPECTED RESULT
console does not crash



SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION
Crash report attached.

-- 
You are receiving this mail because:
You are watching all bug changes.

[KScreen] [Bug 482354] New: org.freedesktop.ScreenSaver::Inhibit ignored by Plasma 6

2024-03-03 Thread Marcelo Vanzin
https://bugs.kde.org/show_bug.cgi?id=482354

Bug ID: 482354
   Summary: org.freedesktop.ScreenSaver::Inhibit ignored by Plasma
6
Classification: Plasma
   Product: KScreen
   Version: 6.0.0
  Platform: Other
OS: Linux
Status: REPORTED
  Severity: normal
  Priority: NOR
 Component: common
  Assignee: kscreen-bugs-n...@kde.org
  Reporter: mmvgro...@gmail.com
  Target Milestone: ---

Created attachment 166380
  --> https://bugs.kde.org/attachment.cgi?id=166380=edit
script to inhibit screen saver using freedesktop dbus service

SUMMARY
Not sure if it's the right component. But since upgrading to Plasma 6,
applications that inhibit the screen saver / display power management have
failed to do so. For example, playing a video in Chrome will not prevent the
screen from turning off.

I also had a script I wrote a long time ago (attached) to prevent the display
from turning off, and it does not seem to work with Plasma 6.

STEPS TO REPRODUCE
1. Turn off display auto-turn-off in settings
2. Run application that inhibits screen saver
3. Watch as screen still turns off

OBSERVED RESULT
Screen turns off after configured time.

EXPECTED RESULT
Screen does not turn off while application has inhibited the screen saver.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Neon 22.04 / Plasma 6 / X11
KDE Plasma Version: 6.0.0
KDE Frameworks Version:  6.0.0 (?)
Qt Version: 6.6.2

-- 
You are receiving this mail because:
You are watching all bug changes.

Re: [commons-crypto] OpenSSL 3.x, FIPS, digests, and HMAC

2023-08-26 Thread Marcelo Vanzin
Oh, wow, I think I heard my name.

First, big disclaimer that I haven't really worked with any of this stuff
for years, so take what I say with a big pinch of salt. I have no idea of
what's the current state of things in OpenSSL-land.

As far as I can remember, the desire to do runtime discovery was for easy
of packaging software using the library. For my company at the time, the
main use was for Apache Spark and other software that people don't
generally embed into their apps, but instead run directly from a package
provided by Apache (or some distribution). Having to build multiple
versions of an already large package and have to teach people to figure out
what's the version that they need was just too much of a hassle.

Maybe there are better ways to do that runtime discovery without the C code
having to introspect into the shared libraries. Maybe that isn't even
needed these days because the libssl API is more stable and the world has
moved on from 1.0.

Anyway, I won't make suggestions because I don't want to talk about things
that I'm completely out of touch with these days, but please don't hold
back on making changes because of something that, at the time, might have
made sense but these days may not.


On Fri, Aug 25, 2023 at 6:48 PM Alex Remily  wrote:

>  the current module into different maven modules? Not support both?>
>
> Agreed.  Just to provide some history, when I was working on the 1.1.x
> upgrade I was guided by commons committer Marcelo Vanzin.  Marcelo required
> a design that supported runtime discovery of the underlying openssl API.  I
> don't recall all of the rationale for the requirement, but he insisted that
> any commons-crypto upgrade must support legacy and current versions of
> openssl transparently to the calling program.  The end result is what we
> have now.  I don't recall where in the code commons-crypto makes the
> underlying version checks, at library initialization time or when a
> specific function is called, but the end result is that users need only
> download the latest commons-crypto release regardless of their underlying
> openssl API--as long as they are running supported openssl versions.
>
> In my view there are a few open questions regarding the current approach as
> compared to an API-specific one.  One, what is the performance penalty
> associated with the dynamic version checks?  Two, how much complexity does
> it introduce into the codebase?  Finally, what was the use case that drove
> the runtime checking requirement?  Marcelo could answer the last question.
> I don't know if he is still involved in the community (I haven't seen him
> around for awhile.  IIRC, he was primarily a Spark committer).
>
> Another consideration is the FIPS certification.  I work in a heavily
> regulated industry and FIPS is a real constraint.  I haven't personally
> encountered a requirement to deploy a FIPS compliant openssl in our
> application code, but it's probably just a matter of time.  In terms of
> expanding our user base, it may make sense to provide that capability.  It
> doesn't seem to be generally available from an existing provider.
>
> Regarding message digests/HMAC, I question whether the performance gain
> from native code would significantly outperform some of the modern JCA
> providers.  As Matt Sicker pointed out, there are other implementations
> supported by major vendors, like AWS, that may be as fast as a JNI wrapper
> on OpenSSL, or at least close enough not to bother with the added
> complexity of the native stuff.  The only way I know to answer that
> question is to write the code and run a load test comparison.
>
>
> https://aws.amazon.com/blogs/opensource/introducing-amazon-corretto-crypto-provider-accp/
>
> In what forum should we discuss these issues?  Are we limited to this
> distro or do we have other options?  How do we form teams?  What is our
> governance model?  How do we make decisions?
>
> FYI:  There's already issues on the backlog for OpenSSL 3 and HMAC:
> https://issues.apache.org/jira/browse/CRYPTO-165
> https://issues.apache.org/jira/browse/CRYPTO-164
>
> Alex
>
>
> On Wed, Aug 23, 2023 at 10:21 PM Gary Gregory 
> wrote:
>
> > That would be great. I think this is worth the effort. A big item to
> > consider is if and how 1.1 vs 3.0 should be handled. Breakup the current
> > module into different maven modules? Not support both?
> >
> > Gary
> >
> > On Wed, Aug 23, 2023, 8:37 PM Alex Remily  wrote:
> >
> > >  learn
> > > how to implement message digests and HMAC on top of OpenSSL 3.0.8.>
> > >
> > > Implementing the OpenSSL 3 API and exposing OpenSSL HMAC functionality
> in
> > > commons-crypto are things I've wanted to engage on for a while now.  I
> > was
> &

[juk] [Bug 120082] juk random album play oddities

2021-03-09 Thread Marcelo Vanzin
https://bugs.kde.org/show_bug.cgi?id=120082

Marcelo Vanzin  changed:

   What|Removed |Added

 Resolution|--- |LATER
 Status|ASSIGNED|RESOLVED

-- 
You are receiving this mail because:
You are watching all bug changes.

Re: [crypto] New Release

2020-07-24 Thread Marcelo Vanzin
om
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Not sure if it's relevant or not, but to get the build to
> > > compile on
> > > > > > > > Windows with MinGW, I commented out line 137 of
> > > > > > > >
> > > > >
> > >
> > https://github.com/apache/commons-crypto/blob/master/src/main/native/org/apache/commons/crypto/org_apache_commons_crypto.h
> > > > > :
> > > > > > > >
> > > > > > > > //#define inline __inline;
> > > > > > > >
> > > > > > > > I never did learn why it was there in the first place, but the
> > > broken
> > > > > > > > build was originally reported as
> > > > > > > >
> > > > > > > > https://issues.apache.org/jira/browse/CRYPTO-137.
> > > > > > > >
> > > > > > > > Now I'm wondering if it may have had something to do with
> > > > > > > > cross-compiling for the build.
> > > > > > > >
> > > > > > > > On Thu, Jun 25, 2020 at 1:13 PM Geoffrey Blake
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Is there anything needed to help move this release along?
> > > From the
> > > > > > > > > looks of the Makefile, Windows was using GCC.  I don't think
> > > the
> > > > > > > > > compiler is going to have much of an impact since the JNI
> > > bindings
> > > > > are
> > > > > > > > > simply calling through to the OpenSSL library that is already
> > > > > > > > > precompiled for the environment.
> > > > > > > > >
> > > > > > > > > On Sat, Jun 13, 2020 at 6:14 PM Xeno Amess <
> > > xenoam...@gmail.com>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > I have a feeling about both mingw or cygwin build output
> > > will be
> > > > > slower
> > > > > > > > > > than microsoft-visual-studio build output...
> > > > > > > > > > Just a feeling, but no evidence.
> > > > > > > > > >
> > > > > > > > > > Alex Remily  于2020年6月14日周日
> > 上午7:02写道:
> > > > > > > > > >
> > > > > > > > > > > I used MinGW64.  It does indeed ship with make.  I can
> > > provide
> > > > > a link
> > > > > > > > > > > to the distribution I used if there's interest.
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jun 13, 2020 at 6:26 PM Marcelo Vanzin <
> > > > > van...@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Pretty sure I remember comments in the code about
> > > building
> > > > > with mingw
> > > > > > > > > > > > on Windows (not cygwin). That should have a version of
> > > make,
> > > > > too,
> > > > > > > > > > > > IIRC.
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Jun 13, 2020 at 3:11 PM Gary Gregory <
> > > > > garydgreg...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Except that you can't build on plain Windows because
> > > the
> > > > > build uses
> > > > > > > > > > > make
> > > > > > > > > > > > > and Microsoft version is called nmake. I might have
> > to
> > > > > cobble up some
> > > > > > > > > > > > > cygwin thing...
> > > > > > > > > > > > >
> > > > > > > > > > > > > Gary
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Jun 13, 2020, 18:02 Alex Remily <
> > > > > alex.rem...@gmail.com> wrote:
> > >

Re: [VOTE] Decommissioning SPIP

2020-07-01 Thread Marcelo Vanzin
I reviewed the docs and PRs from way before an SPIP was explicitly
asked, so I'm comfortable with giving a +1 even if I haven't really
fully read the new document,

On Wed, Jul 1, 2020 at 6:05 PM Holden Karau  wrote:
>
> Hi Spark Devs,
>
> I think discussion has settled on the SPIP doc at 
> https://docs.google.com/document/d/1EOei24ZpVvR7_w0BwBjOnrWRy4k-qTdIlx60FsHZSHA/edit?usp=sharing
>  , design doc at 
> https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit,
>  or JIRA https://issues.apache.org/jira/browse/SPARK-20624, and I've received 
> a request to put the SPIP up for a VOTE quickly. The discussion thread on the 
> mailing list is at 
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Graceful-Decommissioning-td29650.html.
>
> Normally this vote would be open for 72 hours, however since it's a long 
> weekend in the US where many of the PMC members are, this vote will not close 
> before July 6th at noon pacific time.
>
> The SPIP procedures are documented at: 
> https://spark.apache.org/improvement-proposals.html. The ASF's voting guide 
> is at https://www.apache.org/foundation/voting.html.
>
> Please vote before July 6th at noon:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don't think this is a good idea because ...
>
> I will start the voting off with a +1 from myself.
>
> Cheers,
>
> Holden



-- 
Marcelo Vanzin
van...@gmail.com
"Life's too short to drink cheap beer"

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[spark] branch master updated (7f6a8ab -> eeb8120)

2020-06-16 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7f6a8ab  [SPARK-31777][ML][PYSPARK] Add user-specified fold column to 
CrossValidator
 add eeb8120  [SPARK-31337][SQL] Support MS SQL Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  |  1 -
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |  2 +-
 pom.xml|  6 ++
 sql/core/pom.xml   |  5 ++
 .../jdbc/connection/ConnectionProvider.scala   |  4 +
 .../jdbc/connection/MSSQLConnectionProvider.scala  | 97 ++
 .../connection/MariaDBConnectionProvider.scala |  2 +-
 .../connection/MSSQLConnectionProviderSuite.scala  | 51 
 8 files changed, 165 insertions(+), 3 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProvider.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProviderSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7f6a8ab -> eeb8120)

2020-06-16 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7f6a8ab  [SPARK-31777][ML][PYSPARK] Add user-specified fold column to 
CrossValidator
 add eeb8120  [SPARK-31337][SQL] Support MS SQL Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  |  1 -
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |  2 +-
 pom.xml|  6 ++
 sql/core/pom.xml   |  5 ++
 .../jdbc/connection/ConnectionProvider.scala   |  4 +
 .../jdbc/connection/MSSQLConnectionProvider.scala  | 97 ++
 .../connection/MariaDBConnectionProvider.scala |  2 +-
 .../connection/MSSQLConnectionProviderSuite.scala  | 51 
 8 files changed, 165 insertions(+), 3 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProvider.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProviderSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7f6a8ab -> eeb8120)

2020-06-16 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7f6a8ab  [SPARK-31777][ML][PYSPARK] Add user-specified fold column to 
CrossValidator
 add eeb8120  [SPARK-31337][SQL] Support MS SQL Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  |  1 -
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |  2 +-
 pom.xml|  6 ++
 sql/core/pom.xml   |  5 ++
 .../jdbc/connection/ConnectionProvider.scala   |  4 +
 .../jdbc/connection/MSSQLConnectionProvider.scala  | 97 ++
 .../connection/MariaDBConnectionProvider.scala |  2 +-
 .../connection/MSSQLConnectionProviderSuite.scala  | 51 
 8 files changed, 165 insertions(+), 3 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProvider.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProviderSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7f6a8ab -> eeb8120)

2020-06-16 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7f6a8ab  [SPARK-31777][ML][PYSPARK] Add user-specified fold column to 
CrossValidator
 add eeb8120  [SPARK-31337][SQL] Support MS SQL Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  |  1 -
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |  2 +-
 pom.xml|  6 ++
 sql/core/pom.xml   |  5 ++
 .../jdbc/connection/ConnectionProvider.scala   |  4 +
 .../jdbc/connection/MSSQLConnectionProvider.scala  | 97 ++
 .../connection/MariaDBConnectionProvider.scala |  2 +-
 .../connection/MSSQLConnectionProviderSuite.scala  | 51 
 8 files changed, 165 insertions(+), 3 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProvider.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProviderSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7f6a8ab -> eeb8120)

2020-06-16 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7f6a8ab  [SPARK-31777][ML][PYSPARK] Add user-specified fold column to 
CrossValidator
 add eeb8120  [SPARK-31337][SQL] Support MS SQL Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  |  1 -
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |  2 +-
 pom.xml|  6 ++
 sql/core/pom.xml   |  5 ++
 .../jdbc/connection/ConnectionProvider.scala   |  4 +
 .../jdbc/connection/MSSQLConnectionProvider.scala  | 97 ++
 .../connection/MariaDBConnectionProvider.scala |  2 +-
 .../connection/MSSQLConnectionProviderSuite.scala  | 51 
 8 files changed, 165 insertions(+), 3 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProvider.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProviderSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Resolved] (SPARK-31337) Support MS Sql Kerberos login in JDBC connector

2020-06-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31337.

Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

> Support MS Sql Kerberos login in JDBC connector
> ---
>
> Key: SPARK-31337
> URL: https://issues.apache.org/jira/browse/SPARK-31337
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [crypto] New Release

2020-06-13 Thread Marcelo Vanzin
NFO] --- maven-antrun-plugin:1.8:run (make) @ commons-crypto ---
> > > > > [INFO] Executing tasks
> > > > >
> > > > > make:
> > > > >  [exec] make native CROSS_PREFIX=arm-linux-gnueabihf-
> > OS_NAME=Linux
> > > > > OS_ARCH=armhf
> > > > >  [exec] make[1]: Entering directory '/mnt/c/git/commons-crypto'
> > > > >  [exec] arm-linux-gnueabihf-gcc -include lib/inc_linux/jni_md.h
> > > > > -I"/usr/lib/jvm/java-1.8.0-openjdk-amd64/include" -O2 -fPIC
> > > > > -fvisibility=hidden -mfloat-abi=hard -Ilib/include -I/usr/include
> > > > > -I"src/main/native/org/apache/commons/crypto/"
> > > > > -I"/usr/lib/jvm/java-1.8.0-openjdk-amd64/include/linux"
> > > > > -I"target/jni-classes/org/apache/commons/crypto/cipher"
> > > > > -I"target/jni-classes/org/apache/commons/crypto/random" -c
> > > > >
> > > >
> > src/main/native/org/apache/commons/crypto/random/OpenSslCryptoRandomNative.c
> > > > > -o
> > > > >
> > > >
> > target/commons-crypto-1.1.0-SNAPSHOT-Linux-armhf/OpenSslCryptoRandomNative.o
> > > > >  [exec] Makefile:54: recipe for target
> > > > >
> > > >
> > 'target/commons-crypto-1.1.0-SNAPSHOT-Linux-armhf/OpenSslCryptoRandomNative.o'
> > > > > failed
> > > > >  [exec] make[1]: Leaving directory '/mnt/c/git/commons-crypto'
> > > > >  [exec] Makefile:105: recipe for target 'linux-armhf' failed
> > > > >  [exec] cc1: fatal error: lib/inc_linux/jni_md.h: No such file or
> > > > > directory
> > > > >  [exec] compilation terminated.
> > > > >  [exec] make[1]: ***
> > > > >
> > > >
> > [target/commons-crypto-1.1.0-SNAPSHOT-Linux-armhf/OpenSslCryptoRandomNative.o]
> > > > > Error 1
> > > > >  [exec] make: *** [linux-armhf] Error 2
> > > > > [INFO]
> > > > >
> > 
> > > > > [INFO] BUILD FAILURE
> > > > > [INFO]
> > > > >
> > 
> > > > > [INFO] Total time:  10.400 s
> > > > > [INFO] Finished at: 2020-06-13T14:16:53Z
> > > > > [INFO]
> > > > >
> > 
> > > > > [ERROR] Failed to execute goal
> > > > > org.apache.maven.plugins:maven-antrun-plugin:1.8:run (make) on
> > project
> > > > > commons-crypto: An Ant BuildException has occured: exec returned: 2
> > > > > [ERROR] around Ant part ... > > > > dir="/mnt/c/git/commons-crypto" executable="make">... @ 5:78 in
> > > > > /mnt/c/git/commons-crypto/target/antrun/build-make.xml
> > > > >
> > > > > For the  linux-aarch64 profile, I need to install the proper package:
> > > > >
> > > > > [INFO] --- maven-antrun-plugin:1.8:run (make) @ commons-crypto ---
> > > > > [INFO] Executing tasks
> > > > >
> > > > > make:
> > > > >  [exec] make native CROSS_PREFIX=aarch64-linux-gnu- OS_NAME=Linux
> > > > > OS_ARCH=aarch64
> > > > >  [exec] make[1]: Entering directory '/mnt/c/git/commons-crypto'
> > > > >  [exec] aarch64-linux-gnu-gcc -Ilib/inc_linux
> > > > > -I"/usr/lib/jvm/java-1.8.0-openjdk-amd64/include" -Ilib/inc_mac -O2
> > -fPIC
> > > > > -fvisibility=hidden -Wall -Werror -Ilib/include -I/usr/include
> > > > > -I"src/main/native/org/apache/commons/crypto/"
> > > > > -I"/usr/lib/jvm/java-1.8.0-openjdk-amd64/include/linux"
> > > > > -I"target/jni-classes/org/apache/commons/crypto/cipher"
> > > > > -I"target/jni-classes/org/apache/commons/crypto/random" -c
> > > > >
> > > >
> > src/main/native/org/apache/commons/crypto/random/OpenSslCryptoRandomNative.c
> > > > > -o
> > > > >
> > > >
> > target/commons-crypto-1.1.0-SNAPSHOT-Linux-aarch64/OpenSslCryptoRandomNative.o
> > > > >  [exec] Makefile:54: recipe for target
> > > > >
> > > >
> > 'target/commons-crypto-1.1.0-SNAPSHOT-Linux-aarch64/OpenSslCryptoRandomNative.o'
> > > > > failed
> > > > >  [exec] make[1]: Leaving directory '/mnt/c/git/commons-crypto'
> > > > >  [exec] Makefile:109: recipe for target 'linux-aarch64' failed
> > > > >  [exec] In file included from
> > > > >
> > > >
> > src/main/native/org/apache/commons/crypto/org_apache_commons_crypto.h:233:0,
> > > > >  [exec]  from
> > > > >
> > > >
> > src/main/native/org/apache/commons/crypto/random/org_apache_commons_crypto_random.h:22,
> > > > >  [exec]  from
> > > > >
> > > >
> > src/main/native/org/apache/commons/crypto/random/OpenSslCryptoRandomNative.c:19:
> > > > >  [exec] /usr/include/openssl/aes.h:13:11: fatal error:
> > > > > openssl/opensslconf.h: No such file or directory
> > > > >  [exec]  # include 
> > > > >  [exec]^~~
> > > > >  [exec] compilation terminated.
> > > > >  [exec] make[1]: ***
> > > > >
> > > >
> > [target/commons-crypto-1.1.0-SNAPSHOT-Linux-aarch64/OpenSslCryptoRandomNative.o]
> > > > > Error 1
> > > > >  [exec] make: *** [linux-aarch64] Error 2
> > > > > [INFO]
> > > > >
> > 
> > > > > [INFO] BUILD FAILURE
> > > > > [INFO]
> > > > >
> > 
> > > > > [INFO] Total time:  10.943 s
> > > > > [INFO] Finished at: 2020-06-13T14:19:44Z
> > > > > [INFO]
> > > > >
> > 
> > > > > [ERROR] Failed to execute goal
> > > > > org.apache.maven.plugins:maven-antrun-plugin:1.8:run (make) on
> > project
> > > > > commons-crypto: An Ant BuildException has occured: exec returned: 2
> > > > > [ERROR] around Ant part ... > > > > dir="/mnt/c/git/commons-crypto" executable="make">... @ 5:78 in
> > > > > /mnt/c/git/commons-crypto/target/antrun/build-make.xml
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > Gary
> > > > >
> > > > > On Fri, Jun 12, 2020 at 8:41 PM Alex Remily 
> > > > wrote:
> > > > >
> > > > > > Just checking in on the status of the 1.1 release.
> > > > > >
> > > > > >
> > -
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > > > > For additional commands, e-mail: dev-h...@commons.apache.org
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Matt Sicker 
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > > For additional commands, e-mail: dev-h...@commons.apache.org
> > > >
> > > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >



-- 
Marcelo Vanzin
van...@gmail.com
"Life's too short to drink cheap beer"

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[spark] branch branch-3.0 updated: [SPARK-31559][YARN] Re-obtain tokens at the startup of AM for yarn cluster mode if principal and keytab are available

2020-05-11 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9469831  [SPARK-31559][YARN] Re-obtain tokens at the startup of AM for 
yarn cluster mode if principal and keytab are available
9469831 is described below

commit 9469831c3751e898ebe78cb642266b50ea167f22
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Mon May 11 17:25:41 2020 -0700

[SPARK-31559][YARN] Re-obtain tokens at the startup of AM for yarn cluster 
mode if principal and keytab are available

### What changes were proposed in this pull request?

This patch re-obtain tokens at the start of AM for yarn cluster mode, if 
principal and keytab are available. It basically transfers the credentials from 
the original user, so this patch puts the new tokens into credentials from the 
original user via overwriting.

To obtain tokens from providers in user application, this patch leverages 
the user classloader as context classloader while initializing token manager in 
the startup of AM.

### Why are the changes needed?

Submitter will obtain delegation tokens for yarn-cluster mode, and add 
these credentials to the launch context. AM will be launched with these 
credentials, and AM and driver are able to leverage these tokens.

In Yarn cluster mode, driver is launched in AM, which in turn initializes 
token manager (while initializing SparkContext) and obtain delegation tokens (+ 
schedule to renew) if both principal and keytab are available.

That said, even we provide principal and keytab to run application with 
yarn-cluster mode, AM always starts with initial tokens from launch context 
until token manager runs and obtains delegation tokens.

So there's a "gap", and if user codes (driver) access to external system 
with delegation tokens (e.g. HDFS) before initializing SparkContext, it cannot 
leverage the tokens token manager will obtain. It will make the application 
fail if AM is killed "after" the initial tokens are expired and relaunched.

This is even a regression: see below codes in branch-2.4:


https://github.com/apache/spark/blob/branch-2.4/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala


https://github.com/apache/spark/blob/branch-2.4/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/AMCredentialRenewer.scala

In Spark 2.4.x, AM runs AMCredentialRenewer at initialization, and 
AMCredentialRenew obtains tokens and merge with credentials being provided with 
launch context of AM. So it guarantees new tokens in driver run.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually tested with specifically crafted application (simple reproducer) - 
https://github.com/HeartSaVioR/spark-delegation-token-experiment/blob/master/src/main/scala/net/heartsavior/spark/example/LongRunningAppWithHDFSConfig.scala

Before this patch, new AM attempt failed when I killed AM after the 
expiration of tokens. After this patch the new AM attempt runs fine.

Closes #28336 from HeartSaVioR/SPARK-31559.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Marcelo Vanzin 
(cherry picked from commit 842b1dcdff0ecab4af9f292c2ff7b2b9ae1ac40a)
Signed-off-by: Marcelo Vanzin 
---
 .../org/apache/spark/deploy/yarn/ApplicationMaster.scala  | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
index 1e8f408..862acd8 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@@ -42,6 +42,7 @@ import org.apache.hadoop.yarn.util.{ConverterUtils, Records}
 import org.apache.spark._
 import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.deploy.history.HistoryServer
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
 import org.apache.spark.deploy.yarn.config._
 import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config._
@@ -860,10 +861,22 @@ object ApplicationMaster extends Logging {
 val ugi = sparkConf.get(PRINCIPAL) match {
   // We only need to log in with the keytab in cluster mode. In client 
mode, the driver
   // handles the user keytab.
-  case Some(principal) if amArgs.userClass != null =>
+  case Some(principal) if master.isClusterMode =>
 val originalCreds = 
UserGroupInformation.getCu

[spark] branch master updated (64fb358 -> 842b1dc)

2020-05-11 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 64fb358  [SPARK-31671][ML] Wrong error message in VectorAssembler
 add 842b1dc  [SPARK-31559][YARN] Re-obtain tokens at the startup of AM for 
yarn cluster mode if principal and keytab are available

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/yarn/ApplicationMaster.scala  | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31559][YARN] Re-obtain tokens at the startup of AM for yarn cluster mode if principal and keytab are available

2020-05-11 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9469831  [SPARK-31559][YARN] Re-obtain tokens at the startup of AM for 
yarn cluster mode if principal and keytab are available
9469831 is described below

commit 9469831c3751e898ebe78cb642266b50ea167f22
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Mon May 11 17:25:41 2020 -0700

[SPARK-31559][YARN] Re-obtain tokens at the startup of AM for yarn cluster 
mode if principal and keytab are available

### What changes were proposed in this pull request?

This patch re-obtain tokens at the start of AM for yarn cluster mode, if 
principal and keytab are available. It basically transfers the credentials from 
the original user, so this patch puts the new tokens into credentials from the 
original user via overwriting.

To obtain tokens from providers in user application, this patch leverages 
the user classloader as context classloader while initializing token manager in 
the startup of AM.

### Why are the changes needed?

Submitter will obtain delegation tokens for yarn-cluster mode, and add 
these credentials to the launch context. AM will be launched with these 
credentials, and AM and driver are able to leverage these tokens.

In Yarn cluster mode, driver is launched in AM, which in turn initializes 
token manager (while initializing SparkContext) and obtain delegation tokens (+ 
schedule to renew) if both principal and keytab are available.

That said, even we provide principal and keytab to run application with 
yarn-cluster mode, AM always starts with initial tokens from launch context 
until token manager runs and obtains delegation tokens.

So there's a "gap", and if user codes (driver) access to external system 
with delegation tokens (e.g. HDFS) before initializing SparkContext, it cannot 
leverage the tokens token manager will obtain. It will make the application 
fail if AM is killed "after" the initial tokens are expired and relaunched.

This is even a regression: see below codes in branch-2.4:


https://github.com/apache/spark/blob/branch-2.4/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala


https://github.com/apache/spark/blob/branch-2.4/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/AMCredentialRenewer.scala

In Spark 2.4.x, AM runs AMCredentialRenewer at initialization, and 
AMCredentialRenew obtains tokens and merge with credentials being provided with 
launch context of AM. So it guarantees new tokens in driver run.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually tested with specifically crafted application (simple reproducer) - 
https://github.com/HeartSaVioR/spark-delegation-token-experiment/blob/master/src/main/scala/net/heartsavior/spark/example/LongRunningAppWithHDFSConfig.scala

Before this patch, new AM attempt failed when I killed AM after the 
expiration of tokens. After this patch the new AM attempt runs fine.

Closes #28336 from HeartSaVioR/SPARK-31559.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Marcelo Vanzin 
(cherry picked from commit 842b1dcdff0ecab4af9f292c2ff7b2b9ae1ac40a)
Signed-off-by: Marcelo Vanzin 
---
 .../org/apache/spark/deploy/yarn/ApplicationMaster.scala  | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
index 1e8f408..862acd8 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@@ -42,6 +42,7 @@ import org.apache.hadoop.yarn.util.{ConverterUtils, Records}
 import org.apache.spark._
 import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.deploy.history.HistoryServer
+import org.apache.spark.deploy.security.HadoopDelegationTokenManager
 import org.apache.spark.deploy.yarn.config._
 import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config._
@@ -860,10 +861,22 @@ object ApplicationMaster extends Logging {
 val ugi = sparkConf.get(PRINCIPAL) match {
   // We only need to log in with the keytab in cluster mode. In client 
mode, the driver
   // handles the user keytab.
-  case Some(principal) if amArgs.userClass != null =>
+  case Some(principal) if master.isClusterMode =>
 val originalCreds = 
UserGroupInformation.getCu

[jira] [Resolved] (SPARK-31559) AM starts with initial fetched tokens in any attempt

2020-05-11 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31559.

Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> AM starts with initial fetched tokens in any attempt
> 
>
> Key: SPARK-31559
> URL: https://issues.apache.org/jira/browse/SPARK-31559
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> The issue is only occurred in yarn-cluster mode.
> Submitter will obtain delegation tokens for yarn-cluster mode, and add these 
> credentials to the launch context. AM will be launched with these 
> credentials, and AM and driver are able to leverage these tokens.
> In Yarn cluster mode, driver is launched in AM, which in turn initializes 
> token manager (while initializing SparkContext) and obtain delegation tokens 
> (+ schedule to renew) if both principal and keytab are available.
> That said, even we provide principal and keytab to run application with 
> yarn-cluster mode, AM always starts with initial tokens from launch context 
> until token manager runs and obtains delegation tokens.
> So there's a "gap", and if user codes (driver) access to external system with 
> delegation tokens (e.g. HDFS) before initializing SparkContext, it cannot 
> leverage the tokens token manager will obtain. It will make the application 
> fail if AM is killed "after" the initial tokens are expired and relaunched.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (64fb358 -> 842b1dc)

2020-05-11 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 64fb358  [SPARK-31671][ML] Wrong error message in VectorAssembler
 add 842b1dc  [SPARK-31559][YARN] Re-obtain tokens at the startup of AM for 
yarn cluster mode if principal and keytab are available

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/yarn/ApplicationMaster.scala  | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[commons-crypto] branch master updated: Update pom.xml to explicitly prevent cleaning during release:prepare

2020-05-07 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/commons-crypto.git


The following commit(s) were added to refs/heads/master by this push:
 new e962854  Update pom.xml to explicitly prevent cleaning during 
release:prepare
e962854 is described below

commit e96285462b86798e7a76defdecf16dbbf4f4b8a8
Author: Geoffrey Blake 
AuthorDate: Thu May 7 20:12:43 2020 -0500

Update pom.xml to explicitly prevent cleaning during release:prepare

Update pom.xml to configure the maven-release-plugin to not perform and 
clean when doing a release:prepare to allow build artifacts from other platform 
builds to be incorporated into the final Jar for publishing to the maven 
artifact repositories.
---
 pom.xml | 8 
 1 file changed, 8 insertions(+)

diff --git a/pom.xml b/pom.xml
index 90d6878..0cafbee 100644
--- a/pom.xml
+++ b/pom.xml
@@ -610,6 +610,14 @@ The following provides more details on the included 
cryptographic software:
   
${basedir}/clirr-excludes.xml
 
   
+  
+org.apache.maven.plugins
+maven-release-plugin
+2.5.3
+
+  verify
+
+  
 
   
   



[Bug 1872001] Re: 5.3.0-46-generic - i915 - frequent GPU hangs / resets rcs0

2020-05-07 Thread Marcelo Vanzin
I was also running into pretty horrendous graphics perf (with bionic),
and fixed it after I did the following (did all at once so not sure
which one did it):

- updated to the 5.3.0-51 kernel
- set the java2d option to opengl (from #82) - although I had issues in other 
apps too
- updated the Xorg server to xserver-xorg-hwe-18.04 (+ other hwe drivers)

My guess is that the last one is what did the trick. Much better for the
last couple of days.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1872001

Title:
  5.3.0-46-generic - i915 - frequent GPU hangs  / resets rcs0

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1872001/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Kernel-packages] [Bug 1872001] Re: 5.3.0-46-generic - i915 - frequent GPU hangs / resets rcs0

2020-05-07 Thread Marcelo Vanzin
I was also running into pretty horrendous graphics perf (with bionic),
and fixed it after I did the following (did all at once so not sure
which one did it):

- updated to the 5.3.0-51 kernel
- set the java2d option to opengl (from #82) - although I had issues in other 
apps too
- updated the Xorg server to xserver-xorg-hwe-18.04 (+ other hwe drivers)

My guess is that the last one is what did the trick. Much better for the
last couple of days.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1872001

Title:
  5.3.0-46-generic - i915 - frequent GPU hangs  / resets rcs0

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Eoan:
  Confirmed

Bug description:
  Hi,

  Since update to HWE kernel 5.3.0-46-generic I am experiencing frequent
  (every couple of minutes) GPU hangs and reset manifesting as 2-3
  seconds freezes of the GUI (other than the mouse pointer).

  No particular triggers identified although have Chrome / Chromium
  running with Hardware Acceleration enabled does appear to increase the
  frequency.

  I have seen incidences of these hangs in jounralctl output using
  previous kernels in the 5.3.0-xx series but they were very infrequent
  (one or twice in a week of daily usage)

  System Info
  steve@steve-Inspiron-5580:~$ inxi -SCGxxxz
  System:Host: steve-Inspiron-5580 Kernel: 5.3.0-46-generic x86_64 bits: 64 
compiler: gcc v: 7.5.0
     Desktop: Cinnamon 4.4.8 wm: muffin 4.4.2 dm: LightDM 1.26.0 
Distro: Linux Mint 19.3 Tricia
     base: Ubuntu 18.04 bionic
  CPU:   Topology: Quad Core model: Intel Core i5-8265U bits: 64 type: MT 
MCP arch: Kaby Lake rev: B
     L2 cache: 6144 KiB
     flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx 
bogomips: 28800
     Speed: 1173 MHz min/max: 400/3900 MHz Core speeds (MHz): 1: 800 2: 
800 3: 800 4: 800 5: 800 6: 800 7: 800
     8: 800
  Graphics:  Device-1: Intel vendor: Dell driver: i915 v: kernel bus ID: 
00:02.0 chip ID: 8086:3ea0
     Display: x11 server: X.Org 1.20.5 driver: modesetting unloaded: 
fbdev,vesa resolution: 1920x1080~60Hz
     OpenGL: renderer: Mesa DRI Intel UHD Graphics (Whiskey Lake 3x8 
GT2) v: 4.5 Mesa 19.2.8 compat-v: 3.0
     direct render: Yes

  steve@steve-Inspiron-5580:~$ journalctl -b | grep i915
  Apr 10 06:15:17 steve-Inspiron-5580 kernel: i915 :00:02.0: vgaarb: 
deactivate vga console
  Apr 10 06:15:17 steve-Inspiron-5580 kernel: i915 :00:02.0: vgaarb: 
changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
  Apr 10 06:15:17 steve-Inspiron-5580 kernel: [drm] Finished loading DMC 
firmware i915/kbl_dmc_ver1_04.bin (v1.4)
  Apr 10 06:15:17 steve-Inspiron-5580 kernel: [drm] Initialized i915 1.6.0 
20190619 for :00:02.0 on minor 0
  Apr 10 06:15:17 steve-Inspiron-5580 kernel: fbcon: i915drmfb (fb0) is primary 
device
  Apr 10 06:15:17 steve-Inspiron-5580 kernel: i915 :00:02.0: fb0: i915drmfb 
frame buffer device
  Apr 10 06:15:17 steve-Inspiron-5580 kernel: snd_hda_intel :00:1f.3: bound 
:00:02.0 (ops i915_audio_component_bind_ops [i915])
  Apr 10 06:16:28 steve-Inspiron-5580 kernel: i915 :00:02.0: GPU HANG: 
ecode 9:0:0x, hang on rcs0
  Apr 10 06:16:28 steve-Inspiron-5580 kernel: i915 :00:02.0: Resetting rcs0 
for hang on rcs0
  Apr 10 06:31:46 steve-Inspiron-5580 kernel: i915 :00:02.0: Resetting rcs0 
for hang on rcs0
  Apr 10 06:37:48 steve-Inspiron-5580 kernel: i915 :00:02.0: Resetting rcs0 
for hang on rcs0
  Apr 10 06:40:46 steve-Inspiron-5580 kernel: i915 :00:02.0: Resetting rcs0 
for hang on rcs0

  I note another user has reported similar issues on the same kernel at 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861395/comments/52
  ---
  ProblemType: Bug
  ApportVersion: 2.20.9-0ubuntu7.14
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC0:  steve  3920 F pulseaudio
   /dev/snd/pcmC0D0p:   steve  3920 F...m pulseaudio
  CurrentDesktop: X-Cinnamon
  DistroRelease: Linux Mint 19.3
  HibernationDevice: RESUME=none
  InstallationDate: Installed on 2019-12-27 (104 days ago)
  InstallationMedia: Linux Mint 19.3 "Tricia" - Release amd64 20191213
  MachineType: Dell Inc. Inspiron 5580
  Package: linux (not installed)
  ProcFB: 0 i915drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-46-generic 
root=UUID=b0eaa5bb-0276-42d4-938f-ee6ce1627906 ro usb_storage.quirks=0bc2:2320: 
quiet splash vt.handoff=1
  ProcVersionSignature: Ubuntu 5.3.0-46.38~18.04.1-generic 5.3.18
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-46-generic N/A
   linux-backports-modules-5.3.0-46-generic  N/A
   linux-firmware1.173.17
  Tags:  tricia
  Uname: Linux 5.3.0-46-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  

[commons-crypto] branch master updated: JaCoCo Increase for Streams (#99)

2020-05-06 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/commons-crypto.git


The following commit(s) were added to refs/heads/master by this push:
 new d8cbe88  JaCoCo Increase for Streams (#99)
d8cbe88 is described below

commit d8cbe8860d6f23548522e1bf5e8c74d5fd89122e
Author: aremily 
AuthorDate: Wed May 6 20:37:37 2020 -0400

JaCoCo Increase for Streams (#99)
---
 .../crypto/stream/AbstractCipherStreamTest.java| 399 +++--
 .../commons/crypto/stream/CtrCryptoStreamTest.java | 134 +++
 .../stream/PositionedCryptoInputStreamTest.java|  37 +-
 3 files changed, 540 insertions(+), 30 deletions(-)

diff --git 
a/src/test/java/org/apache/commons/crypto/stream/AbstractCipherStreamTest.java 
b/src/test/java/org/apache/commons/crypto/stream/AbstractCipherStreamTest.java
index 344fc0d..8d585a6 100644
--- 
a/src/test/java/org/apache/commons/crypto/stream/AbstractCipherStreamTest.java
+++ 
b/src/test/java/org/apache/commons/crypto/stream/AbstractCipherStreamTest.java
@@ -17,6 +17,8 @@
  */
 package org.apache.commons.crypto.stream;
 
+import static org.junit.Assert.assertEquals;
+
 import java.io.BufferedInputStream;
 import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
@@ -27,10 +29,13 @@ import java.io.OutputStream;
 import java.nio.ByteBuffer;
 import java.nio.channels.Channels;
 import java.nio.channels.ReadableByteChannel;
+import java.security.Key;
 import java.security.SecureRandom;
+import java.security.spec.AlgorithmParameterSpec;
 import java.util.Properties;
 import java.util.Random;
 
+import javax.crypto.spec.GCMParameterSpec;
 import javax.crypto.spec.IvParameterSpec;
 import javax.crypto.spec.SecretKeySpec;
 
@@ -44,9 +49,9 @@ import org.junit.Test;
 
 public abstract class AbstractCipherStreamTest {
 
-private final int dataLen = 2;
-private final byte[] data = new byte[dataLen];
-private byte[] encData;
+protected final int dataLen = 2;
+protected final byte[] data = new byte[dataLen];
+protected byte[] encData;
 private final Properties props = new Properties();
 protected byte[] key = new byte[16];
 protected byte[] iv = new byte[16];
@@ -98,6 +103,26 @@ public abstract class AbstractCipherStreamTest {
 doByteBufferWrite(AbstractCipherTest.JCE_CIPHER_CLASSNAME, baos, true);
 doByteBufferWrite(AbstractCipherTest.OPENSSL_CIPHER_CLASSNAME, baos, 
true);
 }
+
+@Test(timeout = 12)
+public void testExceptions() throws Exception {
+final ByteArrayOutputStream baos = new ByteArrayOutputStream();
+doExceptionTest(AbstractCipherTest.JCE_CIPHER_CLASSNAME, baos, false);
+doExceptionTest(AbstractCipherTest.OPENSSL_CIPHER_CLASSNAME, baos, 
false);
+
+doExceptionTest(AbstractCipherTest.JCE_CIPHER_CLASSNAME, baos, true);
+doExceptionTest(AbstractCipherTest.OPENSSL_CIPHER_CLASSNAME, baos, 
true);
+}
+
+@Test(timeout = 12)
+public void testFieldGetters() throws Exception {
+final ByteArrayOutputStream baos = new ByteArrayOutputStream();
+doFieldGetterTest(AbstractCipherTest.JCE_CIPHER_CLASSNAME, baos, 
false);
+doFieldGetterTest(AbstractCipherTest.OPENSSL_CIPHER_CLASSNAME, baos, 
false);
+
+doFieldGetterTest(AbstractCipherTest.JCE_CIPHER_CLASSNAME, baos, true);
+doFieldGetterTest(AbstractCipherTest.OPENSSL_CIPHER_CLASSNAME, baos, 
true);
+}
 
 protected void doSkipTest(final String cipherClass, final boolean 
withChannel)
 throws IOException {
@@ -110,9 +135,11 @@ public abstract class AbstractCipherStreamTest {
 new ByteArrayInputStream(encData), getCipher(cipherClass),
 defaultBufferSize, iv, withChannel)) {
 final byte[] result = new byte[dataLen];
-final int n1 = readAll(in, result, 0, dataLen / 3);
+final int n1 = readAll(in, result, 0, dataLen / 5);
 
-long skipped = in.skip(dataLen / 3);
+Assert.assertEquals(in.skip(0), 0);
+
+long skipped = in.skip(dataLen / 5);
 final int n2 = readAll(in, result, 0, dataLen);
 
 Assert.assertEquals(dataLen, n1 + skipped + n2);
@@ -197,46 +224,261 @@ public abstract class AbstractCipherStreamTest {
 getCipher(cipherClass), smallBufferSize, iv, withChannel);
 buf.clear();
 byteBufferReadCheck(in, buf, 11);
+in.close();  
+
+// Direct buffer, small buffer size, initial buffer position is 0, 
final read
+in = getCryptoInputStream(new ByteArrayInputStream(encData),
+getCipher(cipherClass), smallBufferSize, iv, withChannel);
+buf.clear();
+byteBufferFinalReadCheck(in, buf, 0);
+in.close();
+
+// Default buffer size, initial buffer position is 0, insufficient 
dest

[commons-crypto] branch master updated: Additional unit tests for JNA, Cipher, Random, Utils testing error inputs (#97)

2020-04-23 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/commons-crypto.git


The following commit(s) were added to refs/heads/master by this push:
 new 218b3db  Additional unit tests for JNA, Cipher, Random, Utils testing 
error inputs (#97)
218b3db is described below

commit 218b3db4149a53397f1e726d37cc3928e2e2135c
Author: Geoffrey Blake 
AuthorDate: Thu Apr 23 22:34:42 2020 -0500

Additional unit tests for JNA, Cipher, Random, Utils testing error inputs 
(#97)

Add unit tests that exercise error paths for JNA, Cipher, Random, and Utils 
parts of the package. Raises coverage overall by a few percentage points. These 
tests touch some of the error checking deep in the library as much as possible.

Tested on MacOS AMD64, RHEL8 x86 and RHEL8 aarch64.
---
 .../commons/crypto/jna/OpenSslJnaCipher.java   |  9 ++-
 .../commons/crypto/cipher/AbstractCipherTest.java  | 66 +-
 .../commons/crypto/cipher/OpenSslCipherTest.java   |  9 +++
 .../commons/crypto/jna/OpenSslJnaCipherTest.java   |  6 +-
 .../crypto/random/CryptoRandomFactoryTest.java | 16 ++
 .../org/apache/commons/crypto/utils/UtilsTest.java | 12 
 6 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/src/main/java/org/apache/commons/crypto/jna/OpenSslJnaCipher.java 
b/src/main/java/org/apache/commons/crypto/jna/OpenSslJnaCipher.java
index bc63d11..1c68add 100644
--- a/src/main/java/org/apache/commons/crypto/jna/OpenSslJnaCipher.java
+++ b/src/main/java/org/apache/commons/crypto/jna/OpenSslJnaCipher.java
@@ -51,6 +51,7 @@ class OpenSslJnaCipher implements CryptoCipher {
 private final AlgorithmMode algMode;
 private final int padding;
 private final String transformation;
+private final int IV_LENGTH = 16;
 
 /**
  * Constructs a {@link CryptoCipher} using JNA into OpenSSL
@@ -104,7 +105,13 @@ class OpenSslJnaCipher implements CryptoCipher {
 throw new InvalidAlgorithmParameterException("Illegal parameters");
 }
 
-   if(algMode == AlgorithmMode.AES_CBC) {
+if ((algMode == AlgorithmMode.AES_CBC || 
+ algMode == AlgorithmMode.AES_CTR) 
+&& iv.length != IV_LENGTH) {
+throw new InvalidAlgorithmParameterException("Wrong IV length: 
must be 16 bytes long");
+}
+
+if(algMode == AlgorithmMode.AES_CBC) {
 switch (key.getEncoded().length) {
 case 16: algo = OpenSslNativeJna.EVP_aes_128_cbc(); break;
 case 24: algo = OpenSslNativeJna.EVP_aes_192_cbc(); break;
diff --git 
a/src/test/java/org/apache/commons/crypto/cipher/AbstractCipherTest.java 
b/src/test/java/org/apache/commons/crypto/cipher/AbstractCipherTest.java
index f14804a..b87c6d5 100644
--- a/src/test/java/org/apache/commons/crypto/cipher/AbstractCipherTest.java
+++ b/src/test/java/org/apache/commons/crypto/cipher/AbstractCipherTest.java
@@ -19,7 +19,11 @@ package org.apache.commons.crypto.cipher;
 
 import static org.junit.Assert.assertNotNull;
 
+import java.lang.reflect.InvocationTargetException;
 import java.nio.ByteBuffer;
+import java.security.InvalidAlgorithmParameterException;
+import java.security.InvalidKeyException;
+import java.security.NoSuchAlgorithmException;
 import java.security.SecureRandom;
 import java.util.Properties;
 import java.util.Random;
@@ -27,6 +31,7 @@ import java.util.Random;
 import javax.crypto.Cipher;
 import javax.crypto.spec.IvParameterSpec;
 import javax.crypto.spec.SecretKeySpec;
+import javax.crypto.spec.GCMParameterSpec;
 import javax.xml.bind.DatatypeConverter;
 
 import org.apache.commons.crypto.utils.ReflectionUtils;
@@ -51,7 +56,8 @@ public abstract class AbstractCipherTest {
 
 // cipher
 static final byte[] KEY = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
-0x09, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16 };
+0x09, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16,
+0x17, 0x18, 0x19, 0x20, 0x21, 0x22, 0x23, 0x24};
 static final byte[] IV = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 };
 private CryptoCipher enc, dec;
@@ -147,6 +153,62 @@ public abstract class AbstractCipherTest {
 }
 }
 
+@Test(expected = RuntimeException.class)
+public void testNullTransform() throws Exception {
+getCipher(null);
+}
+
+@Test(expected = RuntimeException.class)
+public void testInvalidTransform() throws Exception {
+getCipher("AES/CBR/NoPadding/garbage/garbage");
+}
+
+@Test
+public void testInvalidKey() throws Exception {
+for (String transform : transformations) {
+try {
+final CryptoCipher cipher = getCipher(transform);
+Assert.assertNotNull(cipher);
+
+final byte[] invalidKey =

[spark] branch master updated (54b97b2 -> c619990)

2020-04-22 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 54b97b2  [MINOR][DOCS] Fix a typo in ContainerPlacementStrategy's 
class comment
 add c619990  [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  | 13 ++--
 .../src/test/resources/db2_krb_setup.sh| 23 ++
 .../spark/sql/jdbc/DB2IntegrationSuite.scala   |  1 -
 .../spark/sql/jdbc/DB2KrbIntegrationSuite.scala| 89 ++
 .../sql/jdbc/DockerJDBCIntegrationSuite.scala  | 28 +--
 .../sql/jdbc/DockerKrbJDBCIntegrationSuite.scala   |  4 +-
 .../sql/jdbc/MariaDBKrbIntegrationSuite.scala  |  1 -
 .../sql/jdbc/PostgresKrbIntegrationSuite.scala |  1 -
 pom.xml|  6 ++
 sql/core/pom.xml   |  5 ++
 .../jdbc/connection/BasicConnectionProvider.scala  |  8 +-
 .../jdbc/connection/ConnectionProvider.scala   | 10 +++
 ...nProvider.scala => DB2ConnectionProvider.scala} | 43 +++
 .../connection/MariaDBConnectionProvider.scala | 11 +--
 .../connection/PostgresConnectionProvider.scala|  9 +--
 .../jdbc/connection/SecureConnectionProvider.scala | 12 +++
 ...uite.scala => DB2ConnectionProviderSuite.scala} |  6 +-
 17 files changed, 204 insertions(+), 66 deletions(-)
 copy dev/sbt-checkstyle => 
external/docker-integration-tests/src/test/resources/db2_krb_setup.sh (58%)
 create mode 100644 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/{PostgresConnectionProvider.scala
 => DB2ConnectionProvider.scala} (51%)
 copy 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/{MariaDBConnectionProviderSuite.scala
 => DB2ConnectionProviderSuite.scala} (80%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Resolved] (SPARK-31272) Support DB2 Kerberos login in JDBC connector

2020-04-22 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31272.

Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

> Support DB2 Kerberos login in JDBC connector
> 
>
> Key: SPARK-31272
> URL: https://issues.apache.org/jira/browse/SPARK-31272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (54b97b2 -> c619990)

2020-04-22 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 54b97b2  [MINOR][DOCS] Fix a typo in ContainerPlacementStrategy's 
class comment
 add c619990  [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  | 13 ++--
 .../src/test/resources/db2_krb_setup.sh| 23 ++
 .../spark/sql/jdbc/DB2IntegrationSuite.scala   |  1 -
 .../spark/sql/jdbc/DB2KrbIntegrationSuite.scala| 89 ++
 .../sql/jdbc/DockerJDBCIntegrationSuite.scala  | 28 +--
 .../sql/jdbc/DockerKrbJDBCIntegrationSuite.scala   |  4 +-
 .../sql/jdbc/MariaDBKrbIntegrationSuite.scala  |  1 -
 .../sql/jdbc/PostgresKrbIntegrationSuite.scala |  1 -
 pom.xml|  6 ++
 sql/core/pom.xml   |  5 ++
 .../jdbc/connection/BasicConnectionProvider.scala  |  8 +-
 .../jdbc/connection/ConnectionProvider.scala   | 10 +++
 ...nProvider.scala => DB2ConnectionProvider.scala} | 43 +++
 .../connection/MariaDBConnectionProvider.scala | 11 +--
 .../connection/PostgresConnectionProvider.scala|  9 +--
 .../jdbc/connection/SecureConnectionProvider.scala | 12 +++
 ...uite.scala => DB2ConnectionProviderSuite.scala} |  6 +-
 17 files changed, 204 insertions(+), 66 deletions(-)
 copy dev/sbt-checkstyle => 
external/docker-integration-tests/src/test/resources/db2_krb_setup.sh (58%)
 create mode 100644 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/{PostgresConnectionProvider.scala
 => DB2ConnectionProvider.scala} (51%)
 copy 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/{MariaDBConnectionProviderSuite.scala
 => DB2ConnectionProviderSuite.scala} (80%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2020-04-21 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089190#comment-17089190
 ] 

Marcelo Masiero Vanzin commented on SPARK-1537:
---

Well, the only thing to start with is the existing SHS code. 
EventLoggingListener + FsHistoryProvider.

> Add integration with Yarn's Application Timeline Server
> ---
>
> Key: SPARK-1537
> URL: https://issues.apache.org/jira/browse/SPARK-1537
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>    Reporter: Marcelo Masiero Vanzin
>Priority: Major
> Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[commons-crypto] branch master updated: CI Improvements (#96)

2020-04-21 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/commons-crypto.git


The following commit(s) were added to refs/heads/master by this push:
 new 2489786  CI Improvements (#96)
2489786 is described below

commit 24897862c41a504a6987a65727669268019f6f2f
Author: Adam Retter 
AuthorDate: Wed Apr 22 01:36:43 2020 +0200

CI Improvements (#96)

* Tidies up the Travis CI configuration
* Adds Arm64 builds to Travis CI
* Adds ppc64le builds to Travis CI
---
 .travis.yml | 98 -
 pom.xml |  4 +++
 2 files changed, 81 insertions(+), 21 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 796de8e..2bd7c8c 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -18,28 +18,84 @@ language: java
 
 matrix:
   include:
-- name: "Ubuntu 14.04 / Java 8 / OpenSSL 1.0.x"
+
+- name: "x64 / Ubuntu 14.04 / Java 8 / OpenSSL 1.0.x"
+  arch: amd64
   os: linux
   dist: trusty
-  before_install:
-- "curl -L --cookie 'oraclelicense=accept-securebackup-cookie;'  
http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip -o 
/tmp/policy.zip && sudo unzip -j -o /tmp/policy.zip *.jar -d `jdk_switcher home 
oraclejdk8`/jre/lib/security && rm /tmp/policy.zip"
-- openssl version -a
-  after_success:
-- mvn clean test jacoco:report coveralls:report
-- name: "OS X / Java 8 / LibreSSL"
+  jdk: openjdk8
+
+- name: "x64 / Ubuntu 18.04 / Java 8 / OpenSSL 1.1.x"
+  arch: amd64
+  os: linux
+  dist: bionic
+  jdk: openjdk8
+
+- name: "aarch64 / Ubuntu 16.04 / Java 8 / OpenSSL 1.0.x"
+  arch: arm64
+  os: linux
+  dist: xenial
+  jdk: openjdk8
+  env:
+- JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-arm64
+  addons:
+apt:
+  packages:
+- maven
+
+- name: "aarch64 / Ubuntu 18.04 / Java 8 / OpenSSL 1.1.x"
+  arch: arm64
+  os: linux
+  dist: bionic
+  jdk: openjdk8
+  env:
+- JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-arm64
+  addons:
+apt:
+  packages:
+- maven
+
+- name: "ppc64le / Ubuntu 16.04 / Java 8 / OpenSSL 1.0.x"
+  arch: ppc64le
+  os: linux
+  dist: xenial
+  jdk: openjdk8
+  env:
+- JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-ppc64el
+  addons:
+apt:
+  packages:
+- maven
+
+- name: "ppc64le / Ubuntu 18.04 / Java 8 / OpenSSL 1.1.x"
+  arch: ppc64le
+  os: linux
+  dist: bionic
+  jdk: openjdk8
+  env:
+- JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-ppc64el
+  addons:
+apt:
+  packages:
+- maven
+
+- name: "OS X / Java 8 / LibreSSL 2.2.x"
   os: osx
   osx_image: xcode9.3
-  before_install:
-- "curl -L --cookie 'oraclelicense=accept-securebackup-cookie;'  
http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip -o 
/tmp/policy.zip && sudo unzip -j -o /tmp/policy.zip *.jar -d 
/Library/Java/JavaVirtualMachines/jdk1.8.0_112.jdk/Contents/Home/jre/lib/security
 && rm /tmp/policy.zip"
-- openssl version -a
-  after_success:
-- mvn clean test jacoco:report coveralls:report
-  
-jdk:
-  - oraclejdk8
-
-script:
-  - mvn apache-rat:check
-  - mvn verify
-  - mvn site
-  - mvn clirr:check
+  jdk: oraclejdk8
+  env:
+- JAVA_HOME=$(/usr/libexec/java_home)
+
+before_install:
+  - |
+curl -L --cookie 'oraclelicense=accept-securebackup-cookie;' 
http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip -o 
/tmp/policy.zip
+sudo unzip -j -o /tmp/policy.zip *.jar -d $JAVA_HOME/jre/lib/security
+rm /tmp/policy.zip
+  - openssl version -a
+install: mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V
+script: mvn test jacoco:report coveralls:report -B -V
+after_success: mvn site -B -V
+
+cache:
+  directories:
+- $HOME/.m2
diff --git a/pom.xml b/pom.xml
index 2202702..90d6878 100644
--- a/pom.xml
+++ b/pom.xml
@@ -256,6 +256,10 @@ The following provides more details on the included 
cryptographic software:
   Tian Jianguo
   jianguo.t...@intel.com
 
+
+  Adam Retter
+  Evolved Binary
+
   
 
   



[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2020-04-21 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089137#comment-17089137
 ] 

Marcelo Masiero Vanzin commented on SPARK-1537:
---

[~templedf] sorry forgot to reply.

ATSv1 wasn't a good match for this, and by the time ATSv2 was developed, 
interest in this feature had long lost traction in the Spark community. So this 
was closed.

Also you probably can do this without requiring the code to live in Spark.

But if you actually want to contribute the integration, there's nothing 
preventing you from opening a new bug and posting a PR.

> Add integration with Yarn's Application Timeline Server
> ---
>
> Key: SPARK-1537
> URL: https://issues.apache.org/jira/browse/SPARK-1537
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>    Reporter: Marcelo Masiero Vanzin
>Priority: Major
> Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [crypto] Help Releasing new Commons Crypto

2020-04-12 Thread Marcelo Vanzin
On Sun, Apr 12, 2020 at 6:13 PM Gary Gregory  wrote:
> > I can do the 64 bit builds on Mac, Linux and Windows, so I'm happy to
> > provide whichever of those is required.  It seems that Geoff can do the
> > arm64 build.  Do we even bother supporting 32 bit architectures at this
> > point?
>
> Unfortunately, we cannot just pick up bits from folks here and there. It
> all has to be buildable from Maven by the release manager in order to
> generate the file signatures properly.

If the only concern is the file signatures, I'm almost sure that the
maven build will pick up native libraries you drop in some directory
and include them in the jar, regardless of whether they're for the
current architecture. Then you'd get the jar with all native libraries
and the right signature.

$ mkdir -p target/classes/org/apache/commons/crypto/native/Linux/x86
$ touch !$/fake.so
touch target/classes/org/apache/commons/crypto/native/Linux/x86/fake.so
$ mvn package
$ jar tf target/commons-crypto-1.1.0-SNAPSHOT.jar  | grep fake.so
org/apache/commons/crypto/native/Linux/x86/fake.so

Frankly, given you need things built on different platforms, that
seems like the easiest way to get a jar with multiple native libraries
- setting up cross-compilers would be kind of a pain.

Now if the requirement is that the release manager needs to build
everything himself, then that makes it a little more complicated.

-- 
Marcelo Vanzin
van...@gmail.com
"Life's too short to drink cheap beer"

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [crypto] Help Releasing new Commons Crypto

2020-04-12 Thread Marcelo Vanzin
Hi Gary,

On Sun, Apr 12, 2020 at 8:53 AM Gary Gregory  wrote:
> > The 1.0 release on maven central only included linux32 and linux64 native
> > libs, even though the Makefile supports many more targets
> >
>
> Please see the snapshot builds which now include more:
> https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-crypto/1.1.0-SNAPSHOT

Here's the native stuff in your snapshot jar:

$ jar tf commons-crypto-1.1.0-20200411.124009-5.jar | grep
nativeorg/apache/commons/crypto/native/
org/apache/commons/crypto/native/Linux/
org/apache/commons/crypto/native/Linux/x86_64/
org/apache/commons/crypto/native/Linux/x86_64/libcommons-crypto.so

Here's the 1.0 release:

$ jar tf 
~/.ivy2/cache/org.apache.commons/commons-crypto/jars/commons-crypto-1.0.0.jar
| grep native
org/apache/commons/crypto/native/
org/apache/commons/crypto/native/Linux/
org/apache/commons/crypto/native/Linux/x86/
org/apache/commons/crypto/native/Linux/x86_64/
org/apache/commons/crypto/native/Mac/
org/apache/commons/crypto/native/Mac/x86_64/
org/apache/commons/crypto/native/Windows/
org/apache/commons/crypto/native/Windows/x86/
org/apache/commons/crypto/native/Windows/x86_64/
org/apache/commons/crypto/native/Linux/x86/libcommons-crypto.so
org/apache/commons/crypto/native/Linux/x86_64/libcommons-crypto.so
org/apache/commons/crypto/native/Mac/x86_64/libcommons-crypto.jnilib
org/apache/commons/crypto/native/Windows/x86/commons-crypto.dll
org/apache/commons/crypto/native/Windows/x86_64/commons-crypto.dll

That's the only thing that worries me: finding someone who can build
all those extra native libraries. I tend to agree that linux64 is the
most important one, but it would be technically a regression from 1.0
to skip the others.

That being said, if we can't solve that, I think it's better to
release something rather than nothing.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [crypto] Help Releasing new Commons Crypto

2020-04-10 Thread Marcelo Vanzin
(-CCs)

Hi Gary, thanks for volunteering to help. If needed I can try helping with
the release process (e.g. uploading artifacts), but I don't really have the
ability to build anything but the linux64 native lib (maybe the linux32 one
if I can find the right packages or docker image).

I looked at the 1.0 release and it seems to include linux32/64, win32/64
and macos64.

src/changes/changes.xml probably needs some love too.


On Fri, Apr 10, 2020 at 5:02 PM Gary Gregory  wrote:

> I'll see what I can do over the weekend.
>
> Gary
>
>
> On Thu, Apr 9, 2020 at 8:45 PM Alex Remily  wrote:
>
>> Commons Crypto Team,
>>
>> It's been four years since the last release of commons crypto.  There have
>> been many updates to the repository since then, notably the integration of
>> OpenSSL 1.1.1, and native arm64 support.  Geoff Blake (copied) and I have
>> been advocating for a new release, and we need assistance from someone who
>> knows the release process and has the necessary accesses.  If any of you
>> are willing to assist with this effort, please come up on the apache
>> commons dev list (also copied) and announce yourself.
>>
>> Looking forward to hearing from you.
>>
>> Alex
>>
>


Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-10 Thread Marcelo Vanzin
-0.5, mostly because this requires extra things not in the default
packaging...

But if you add the hadoop-aws libraries and dependencies to Spark built
with Hadoop 3, things don't work:

$ ./bin/spark-shell --jars s3a://blah
20/04/10 16:28:32 WARN Utils: Your hostname, vanzin-t480 resolves to a
loopback address: 127.0.1.1; using 192.168.2.14 instead (on interface
wlp3s0)
20/04/10 16:28:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
to another address
20/04/10 16:28:32 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
20/04/10 16:28:32 WARN MetricsConfig: Cannot locate configuration:
tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
at
org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:816)
at
org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:792)
at
org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:747)
at
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider.(SimpleAWSCredentialsProvider.java:58)
at
org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:600)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at
org.apache.spark.deploy.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:191)

That's because Hadoop 3.2 is using Guava 27 and Spark still ships Guava 14
(which is ok for Hadoop 2).


On Tue, Mar 31, 2020 at 8:05 PM Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.0.0.
>
> The vote is open until 11:59pm Pacific time Fri Apr 3, and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.0.0-rc1 (commit
> 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1):
> https://github.com/apache/spark/tree/v3.0.0-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1341/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/
>
> The list of bug fixes going into 2.4.5 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12339177
>
> This release is using the release script of the tag v3.0.0-rc1.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.0?
> ===
> The current list of open tickets targeted at 3.0.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.0.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said

[jira] [Resolved] (SPARK-31021) Support MariaDB Kerberos login in JDBC connector

2020-04-09 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31021.

Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

> Support MariaDB Kerberos login in JDBC connector
> 
>
> Key: SPARK-31021
> URL: https://issues.apache.org/jira/browse/SPARK-31021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (014d335 -> 1354d2d)

2020-04-09 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 014d335  [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data 
and test tables among multiple test cases
 add 1354d2d  [SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  |  4 +-
 .../test/resources/mariadb_docker_entrypoint.sh| 20 ++
 .../src/test/resources/mariadb_krb_setup.sh|  6 +-
 .../sql/jdbc/DockerJDBCIntegrationSuite.scala  | 22 +--
 .../sql/jdbc/DockerKrbJDBCIntegrationSuite.scala   | 75 +++--
 .../sql/jdbc/MariaDBKrbIntegrationSuite.scala  | 67 +++
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |  2 -
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |  1 -
 .../spark/sql/jdbc/OracleIntegrationSuite.scala|  1 -
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  |  1 -
 .../sql/jdbc/PostgresKrbIntegrationSuite.scala | 76 ++
 pom.xml|  6 ++
 sql/core/pom.xml   |  4 +-
 .../jdbc/connection/ConnectionProvider.scala   |  7 ++
 .../connection/MariaDBConnectionProvider.scala | 54 +++
 .../connection/PostgresConnectionProvider.scala| 50 +++---
 .../jdbc/connection/SecureConnectionProvider.scala | 75 +
 .../connection/ConnectionProviderSuiteBase.scala   | 69 
 .../MariaDBConnectionProviderSuite.scala}  | 12 ++--
 .../PostgresConnectionProviderSuite.scala  | 61 ++---
 20 files changed, 399 insertions(+), 214 deletions(-)
 copy bin/beeline => 
external/docker-integration-tests/src/test/resources/mariadb_docker_entrypoint.sh
 (67%)
 copy dev/scalafmt => 
external/docker-integration-tests/src/test/resources/mariadb_krb_setup.sh (77%)
 create mode 100644 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProvider.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/SecureConnectionProvider.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProviderSuiteBase.scala
 copy 
sql/core/src/{main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/BasicConnectionProvider.scala
 => 
test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProviderSuite.scala}
 (70%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (014d335 -> 1354d2d)

2020-04-09 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 014d335  [SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data 
and test tables among multiple test cases
 add 1354d2d  [SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC 
connector

No new revisions were added by this update.

Summary of changes:
 external/docker-integration-tests/pom.xml  |  4 +-
 .../test/resources/mariadb_docker_entrypoint.sh| 20 ++
 .../src/test/resources/mariadb_krb_setup.sh|  6 +-
 .../sql/jdbc/DockerJDBCIntegrationSuite.scala  | 22 +--
 .../sql/jdbc/DockerKrbJDBCIntegrationSuite.scala   | 75 +++--
 .../sql/jdbc/MariaDBKrbIntegrationSuite.scala  | 67 +++
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |  2 -
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |  1 -
 .../spark/sql/jdbc/OracleIntegrationSuite.scala|  1 -
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  |  1 -
 .../sql/jdbc/PostgresKrbIntegrationSuite.scala | 76 ++
 pom.xml|  6 ++
 sql/core/pom.xml   |  4 +-
 .../jdbc/connection/ConnectionProvider.scala   |  7 ++
 .../connection/MariaDBConnectionProvider.scala | 54 +++
 .../connection/PostgresConnectionProvider.scala| 50 +++---
 .../jdbc/connection/SecureConnectionProvider.scala | 75 +
 .../connection/ConnectionProviderSuiteBase.scala   | 69 
 .../MariaDBConnectionProviderSuite.scala}  | 12 ++--
 .../PostgresConnectionProviderSuite.scala  | 61 ++---
 20 files changed, 399 insertions(+), 214 deletions(-)
 copy bin/beeline => 
external/docker-integration-tests/src/test/resources/mariadb_docker_entrypoint.sh
 (67%)
 copy dev/scalafmt => 
external/docker-integration-tests/src/test/resources/mariadb_krb_setup.sh (77%)
 create mode 100644 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProvider.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/SecureConnectionProvider.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProviderSuiteBase.scala
 copy 
sql/core/src/{main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/BasicConnectionProvider.scala
 => 
test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProviderSuite.scala}
 (70%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Commented] (CRYPTO-139) adding suuport for AARCH64 architecture in common crypto

2020-03-31 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/CRYPTO-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072285#comment-17072285
 ] 

Marcelo Masiero Vanzin commented on CRYPTO-139:
---

[~blakegw] closing but I don't really have enough permissions to assign this to 
you (since you need to be added as a contributor to the project).

> adding suuport for AARCH64 architecture in common crypto
> 
>
> Key: CRYPTO-139
> URL: https://issues.apache.org/jira/browse/CRYPTO-139
> Project: Commons Crypto
>  Issue Type: New Feature
>Reporter: puresoftware
>Priority: Critical
> Attachments: CRYPTO-139-commons-crypto.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> AARCH 64 architecture support is missing in the exiting common-crypto code. 
> Now we did some modification in the code level and add the support for 
> AARCH64. Now the common-crypto project can be used in aarch64 platform and 
> also compatible with openssl 1.1.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CRYPTO-139) adding suuport for AARCH64 architecture in common crypto

2020-03-31 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/CRYPTO-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved CRYPTO-139.
---
Fix Version/s: 1.1.0
   Resolution: Fixed

> adding suuport for AARCH64 architecture in common crypto
> 
>
> Key: CRYPTO-139
> URL: https://issues.apache.org/jira/browse/CRYPTO-139
> Project: Commons Crypto
>  Issue Type: New Feature
>Reporter: puresoftware
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: CRYPTO-139-commons-crypto.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> AARCH 64 architecture support is missing in the exiting common-crypto code. 
> Now we did some modification in the code level and add the support for 
> AARCH64. Now the common-crypto project can be used in aarch64 platform and 
> also compatible with openssl 1.1.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[commons-crypto] branch master updated: [CRYPTO-139] Makefile.common support for aarch64 native libraries. (#95)

2020-03-31 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/commons-crypto.git


The following commit(s) were added to refs/heads/master by this push:
 new 80a90ca  [CRYPTO-139] Makefile.common support for aarch64 native 
libraries. (#95)
80a90ca is described below

commit 80a90ca016e55cc3d3c6ea650f28390fd3c0e9a2
Author: Geoffrey Blake 
AuthorDate: Tue Mar 31 20:15:34 2020 -0500

[CRYPTO-139] Makefile.common support for aarch64 native libraries. (#95)
---
 Makefile.common | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/Makefile.common b/Makefile.common
index 7a41d71..f0f99c5 100644
--- a/Makefile.common
+++ b/Makefile.common
@@ -53,7 +53,7 @@ endif
 
 # os=Default is meant to be generic unix/linux
 
-known_os_archs := Linux-x86 Linux-x86_64 Linux-arm Linux-armhf Linux-ppc 
Linux-ppc64 Mac-x86 Mac-x86_64 FreeBSD-x86_64 Windows-x86 Windows-x86_64 
SunOS-x86 SunOS-sparc SunOS-x86_64 AIX-ppc64
+known_os_archs := Linux-x86 Linux-x86_64 Linux-aarch64 Linux-arm Linux-armhf 
Linux-ppc Linux-ppc64 Mac-x86 Mac-x86_64 FreeBSD-x86_64 Windows-x86 
Windows-x86_64 SunOS-x86 SunOS-sparc SunOS-x86_64 AIX-ppc64
 os_arch := $(OS_NAME)-$(OS_ARCH)
 
 ifeq (,$(findstring $(strip $(os_arch)),$(known_os_archs)))
@@ -169,6 +169,15 @@ Linux-armhf_LINKFLAGS := -shared -static-libgcc
 Linux-armhf_LIBNAME   := libcommons-crypto.so
 Linux-armhf_COMMONS_CRYPTO_FLAGS:=
 
+Linux-aarch64_CC:= $(CROSS_PREFIX)gcc
+Linux-aarch64_CXX   := $(CROSS_PREFIX)g++
+Linux-aarch64_STRIP := $(CROSS_PREFIX)strip
+Linux-aarch64_CXXFLAGS  := -Ilib/inc_linux -I$(JAVA_HOME)/include 
-Ilib/inc_mac -O2 -fPIC -fvisibility=hidden -Wall -Werror
+Linux-aarch64_CFLAGS:= -Ilib/inc_linux -I$(JAVA_HOME)/include 
-Ilib/inc_mac -O2 -fPIC -fvisibility=hidden -Wall -Werror
+Linux-aarch64_LINKFLAGS := -shared -static-libgcc
+Linux-aarch64_LIBNAME   := libcommons-crypto.so
+Linux-aarch64_COMMONS_CRYPTO_FLAGS  :=
+
 Mac-x86_CC:= gcc -arch i386
 Mac-x86_CXX   := g++ -arch i386
 Mac-x86_STRIP := strip -x



[jira] [Resolved] (SPARK-30874) Support Postgres Kerberos login in JDBC connector

2020-03-12 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30874.

Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

> Support Postgres Kerberos login in JDBC connector
> -
>
> Key: SPARK-30874
> URL: https://issues.apache.org/jira/browse/SPARK-30874
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated: [SPARK-30874][SQL] Support Postgres Kerberos login in JDBC connector

2020-03-12 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 231e650  [SPARK-30874][SQL] Support Postgres Kerberos login in JDBC 
connector
231e650 is described below

commit 231e65092fa97516e30c4ef12e635bfe3e97c7f0
Author: Gabor Somogyi 
AuthorDate: Thu Mar 12 19:04:35 2020 -0700

[SPARK-30874][SQL] Support Postgres Kerberos login in JDBC connector

### What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, 
remote executors (yarn-client/cluster etc. modes) fail to establish a 
connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data 
sources (SQL Server, Oracle) in enterprise environment where exposing simple 
authentication access is not an option due to IT policy issues.

In this PR I've added Postgres support (other supported databases will come 
in later PRs).

What this PR contains:
* Added `keytab` and `principal` JDBC options
* Added `ConnectionProvider` trait and it's impementations:
  * `BasicConnectionProvider` => unsecure connection
  * `PostgresConnectionProvider` => postgres secure connection
* Added `ConnectionProvider` tests
* Added `PostgresKrbIntegrationSuite` docker integration test
* Created `SecurityUtils` to concentrate re-usable security related 
functionalities
* Documentation

### Why are the changes needed?
Missing JDBC kerberos support.

### Does this PR introduce any user-facing change?
Yes, 2 additional JDBC options added:
* keytab
* principal

If both provided then Spark does kerberos authentication.

### How was this patch tested?
To demonstrate the functionality with a standalone application I've created 
this repository: https://github.com/gaborgsomogyi/docker-kerberos

* Additional + existing unit tests
* Additional docker integration test
* Test on cluster manually
* `SKIP_API=1 jekyll build`

Closes #27637 from gaborgsomogyi/SPARK-30874.

Authored-by: Gabor Somogyi 
Signed-off-by: Marcelo Vanzin 
---
 .../org/apache/spark/util/SecurityUtils.scala  |  69 +++
 docs/sql-data-sources-jdbc.md  |  14 +++
 external/docker-integration-tests/pom.xml  |   5 +
 .../src/test/resources/log4j.properties|  36 ++
 .../src/test/resources/postgres_krb_setup.sh   |  21 
 .../sql/jdbc/DockerJDBCIntegrationSuite.scala  |  15 ++-
 .../sql/jdbc/DockerKrbJDBCIntegrationSuite.scala   |  94 +++
 .../sql/jdbc/PostgresKrbIntegrationSuite.scala | 129 +
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala |  30 +
 .../org/apache/spark/kafka010/KafkaTokenUtil.scala |  32 +
 .../execution/datasources/jdbc/JDBCOptions.scala   |  25 +++-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   3 +-
 .../jdbc/connection/BasicConnectionProvider.scala  |  29 +
 .../jdbc/connection/ConnectionProvider.scala   |  52 +
 .../connection/PostgresConnectionProvider.scala|  82 +
 .../PostgresConnectionProviderSuite.scala  |  85 ++
 16 files changed, 663 insertions(+), 58 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/util/SecurityUtils.scala 
b/core/src/main/scala/org/apache/spark/util/SecurityUtils.scala
new file mode 100644
index 000..7831f66
--- /dev/null
+++ b/core/src/main/scala/org/apache/spark/util/SecurityUtils.scala
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+/**
+ * Various utility methods used by Spark Security.
+ */
+private[spark] object SecurityUtils {
+  private val JAVA_VENDOR = "java.vendor"
+  private val IBM_KRB_DEBUG_CONFIG = "com.ibm.security.krb5.Krb5Debug"
+  private val SUN_KRB_DEBUG_CONFIG = "sun.security.krb5.debug"
+
+  def setGlo

Re: Keytab, Proxy User & Principal

2020-03-12 Thread Marcelo Vanzin
On Fri, Feb 28, 2020 at 6:21 AM Lars Francke  wrote:

> Can we not allow specifying a keytab and principal together with proxy
> user but those are only used for the initial login to submit the job and
> are not shipped to the cluster? This way jobs wouldn't need to rely on the
> operating system.
>

I'm not sure I 100% understand your use case (even if multiple services are
using the credential cache, why would that be a problem?), but from Spark's
side, the only issue with this is making it clear to the user when things
are being submitted one way or another.

But frankly this feels more like something better taken care of in Livy
(e.g. by using KRB5CCNAME when running spark-submit).

-- 
Marcelo Vanzin
van...@gmail.com
"Life's too short to drink cheap beer"


[spark] branch story/PLAT-2902/create-spark-gr-with-ci updated: Bootstrapping CircleCI.

2020-02-19 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch story/PLAT-2902/create-spark-gr-with-ci
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to 
refs/heads/story/PLAT-2902/create-spark-gr-with-ci by this push:
 new bb81672  Bootstrapping CircleCI.
bb81672 is described below

commit bb81672a7eccd3419c162792b276eacd1fc366ed
Author: Marcelo Vanzin 
AuthorDate: Wed Feb 19 16:39:12 2020 -0800

Bootstrapping CircleCI.
---
 config.yml  | 39 +++
 gr/build.sh | 12 
 2 files changed, 51 insertions(+)

diff --git a/config.yml b/config.yml
new file mode 100644
index 000..360ad5b
--- /dev/null
+++ b/config.yml
@@ -0,0 +1,39 @@
+version: 2.1
+
+orbs:
+  aws-ecr: circleci/aws-ecr@6.5.0
+
+jobs:
+  build:
+machine:
+  image: ubuntu-1604:201903-01
+steps:
+  - checkout
+  - restore_cache:
+  key: gr-spark-mvn-cache
+  paths:
+- ~/.m2
+  - run:
+  name: "Build Spark"
+  command: gr/build.sh
+  - save_cache:
+  key: gr-spark-mvn-cache
+  paths:
+- ~/.m2
+
+# workflows:
+#   build_and_push:
+# jobs:
+#   - run_tests
+#   - aws-ecr/build-and-push-image:
+#   account-url: ECR_ENDPOINT
+#   aws-access-key-id: AWS_ACCESS_KEY_ID
+#   aws-secret-access-key: AWS_SECRET_ACCESS_KEY
+#   create-repo: false
+#   dockerfile: Dockerfile
+#   profile-name: circleci
+#   region: AWS_DEFAULT_REGION
+#   repo: protean-operator
+#   tag: ${CIRCLE_SHA1}
+#   requires:
+# - run_tests
diff --git a/gr/build.sh b/gr/build.sh
new file mode 100755
index 000..700165e
--- /dev/null
+++ b/gr/build.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+#
+# Build Spark with all the needed options for GR.
+#
+
+cd $(dirname $0)/..
+./dev/make-distribution.sh \
+   --tgz   
\
+   -Pkubernetes\
+   -Phive  
\
+   -Phadoop-cloud  \
+   -Dhadoop.version=2.10.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Package AWS SDK.

2020-02-19 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch story/PLAT-2902/create-spark-gr-with-ci
in repository https://gitbox.apache.org/repos/asf/spark.git

commit ff7ba54865a35c5fe1f8de92b4e52202931598e1
Author: Marcelo Vanzin 
AuthorDate: Thu Feb 13 16:14:09 2020 -0800

Package AWS SDK.
---
 hadoop-cloud/pom.xml | 5 -
 1 file changed, 5 deletions(-)

diff --git a/hadoop-cloud/pom.xml b/hadoop-cloud/pom.xml
index 42941b9..b3e 100644
--- a/hadoop-cloud/pom.xml
+++ b/hadoop-cloud/pom.xml
@@ -102,11 +102,6 @@
   com.fasterxml.jackson.core
   jackson-annotations
 
-
-
-  com.amazonaws
-  aws-java-sdk
-
   
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch story/PLAT-2902/create-spark-gr-with-ci created (now ff7ba54)

2020-02-19 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch story/PLAT-2902/create-spark-gr-with-ci
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at ff7ba54  Package AWS SDK.

This branch includes the following new commits:

 new ff7ba54  Package AWS SDK.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (580c2b7 -> a2fe73b)

2020-01-28 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 580c2b7  [SPARK-27166][SQL][FOLLOWUP] Refactor to build string once
 add a2fe73b  [SPARK-30481][CORE] Integrate event log compactor into Spark 
History Server

No new revisions were added by this update.

Summary of changes:
 .../deploy/history/EventLogFileCompactor.scala |   9 +-
 .../spark/deploy/history/FsHistoryProvider.scala   | 173 +++--
 .../org/apache/spark/internal/config/History.scala |  16 ++
 .../org/apache/spark/internal/config/package.scala |  18 ---
 .../history/EventLogFileCompactorSuite.scala   |  49 +++---
 .../deploy/history/FsHistoryProviderSuite.scala| 126 +--
 docs/monitoring.md |  21 ++-
 7 files changed, 313 insertions(+), 99 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (580c2b7 -> a2fe73b)

2020-01-28 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 580c2b7  [SPARK-27166][SQL][FOLLOWUP] Refactor to build string once
 add a2fe73b  [SPARK-30481][CORE] Integrate event log compactor into Spark 
History Server

No new revisions were added by this update.

Summary of changes:
 .../deploy/history/EventLogFileCompactor.scala |   9 +-
 .../spark/deploy/history/FsHistoryProvider.scala   | 173 +++--
 .../org/apache/spark/internal/config/History.scala |  16 ++
 .../org/apache/spark/internal/config/package.scala |  18 ---
 .../history/EventLogFileCompactorSuite.scala   |  49 +++---
 .../deploy/history/FsHistoryProviderSuite.scala| 126 +--
 docs/monitoring.md |  21 ++-
 7 files changed, 313 insertions(+), 99 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Resolved] (SPARK-30481) Integrate event log compactor into Spark History Server

2020-01-28 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30481.

Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> Integrate event log compactor into Spark History Server
> ---
>
> Key: SPARK-30481
> URL: https://issues.apache.org/jira/browse/SPARK-30481
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> This issue depends on SPARK-29779 and SPARK-30479, and focuses on integrating 
> event log compactor into Spark History Server and enable configurations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30557) Add public documentation for SPARK_SUBMIT_OPTS

2020-01-17 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018256#comment-17018256
 ] 

Marcelo Masiero Vanzin commented on SPARK-30557:


I don't exactly remember what that does, but a quick looks seem to indicate 
it's basically another way of setting JVM options, used in some internal code. 
We have {{--driver-java-options}} for users already.

> Add public documentation for SPARK_SUBMIT_OPTS
> --
>
> Key: SPARK-30557
> URL: https://issues.apache.org/jira/browse/SPARK-30557
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Documentation
>Affects Versions: 2.4.4
>Reporter: Nicholas Chammas
>Priority: Minor
>
> Is `SPARK_SUBMIT_OPTS` part of Spark's public interface? If so, it needs some 
> documentation. I cannot see it documented 
> [anywhere|https://github.com/apache/spark/search?q=SPARK_SUBMIT_OPTS_q=SPARK_SUBMIT_OPTS]
>  in the docs.
> How do you use it? What is it useful for? What's an example usage? etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29876) Delete/archive file source completed files in separate thread

2020-01-17 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29876.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26502
[https://github.com/apache/spark/pull/26502]

> Delete/archive file source completed files in separate thread
> -
>
> Key: SPARK-29876
> URL: https://issues.apache.org/jira/browse/SPARK-29876
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-20568 added the possibility to clean up completed files in streaming 
> query. Deleting/archiving uses the main thread which can slow down 
> processing. It would be good to do this on separate thread(s).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29876) Delete/archive file source completed files in separate thread

2020-01-17 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29876:
--

Assignee: Gabor Somogyi

> Delete/archive file source completed files in separate thread
> -
>
> Key: SPARK-29876
> URL: https://issues.apache.org/jira/browse/SPARK-29876
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>
> SPARK-20568 added the possibility to clean up completed files in streaming 
> query. Deleting/archiving uses the main thread which can slow down 
> processing. It would be good to do this on separate thread(s).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated: [SPARK-29876][SS] Delete/archive file source completed files in separate thread

2020-01-17 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new abf759a  [SPARK-29876][SS] Delete/archive file source completed files 
in separate thread
abf759a is described below

commit abf759a91e01497586b8bb6b7a314dd28fd6cff1
Author: Gabor Somogyi 
AuthorDate: Fri Jan 17 10:45:36 2020 -0800

[SPARK-29876][SS] Delete/archive file source completed files in separate 
thread

### What changes were proposed in this pull request?
[SPARK-20568](https://issues.apache.org/jira/browse/SPARK-20568) added the 
possibility to clean up completed files in streaming query. Deleting/archiving 
uses the main thread which can slow down processing. In this PR I've created 
thread pool to handle file delete/archival. The number of threads can be 
configured with `spark.sql.streaming.fileSource.cleaner.numThreads`.

### Why are the changes needed?
Do file delete/archival in separate thread.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing unit tests.

Closes #26502 from gaborgsomogyi/SPARK-29876.

Authored-by: Gabor Somogyi 
Signed-off-by: Marcelo Vanzin 
---
 docs/structured-streaming-programming-guide.md |  5 +--
 .../org/apache/spark/sql/internal/SQLConf.scala|  6 
 .../sql/execution/streaming/FileStreamSource.scala | 40 +++---
 .../sql/streaming/FileStreamSourceSuite.scala  |  9 +++--
 4 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 306d688..429d456 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -551,9 +551,10 @@ Here are the details of all the sources in Spark.
 When "archive" is provided, additional option 
sourceArchiveDir must be provided as well. The value of 
"sourceArchiveDir" must not match with source pattern in depth (the number of 
directories from the root directory), where the depth is minimum of depth on 
both paths. This will ensure archived files are never included as new source 
files.
 For example, suppose you provide '/hello?/spark/*' as source pattern, 
'/hello1/spark/archive/dir' cannot be used as the value of "sourceArchiveDir", 
as '/hello?/spark/*' and '/hello1/spark/archive' will be matched. 
'/hello1/spark' cannot be also used as the value of "sourceArchiveDir", as 
'/hello?/spark' and '/hello1/spark' will be matched. '/archived/here' would be 
OK as it doesn't match.
 Spark will move source files respecting their own path. For example, 
if the path of source file is /a/b/dataset.txt and the path of 
archive directory is /archived/here, file will be moved to 
/archived/here/a/b/dataset.txt.
-NOTE: Both archiving (via moving) or deleting completed files will 
introduce overhead (slow down) in each micro-batch, so you need to understand 
the cost for each operation in your file system before enabling this option. On 
the other hand, enabling this option will reduce the cost to list source files 
which can be an expensive operation.
+NOTE: Both archiving (via moving) or deleting completed files will 
introduce overhead (slow down, even if it's happening in separate thread) in 
each micro-batch, so you need to understand the cost for each operation in your 
file system before enabling this option. On the other hand, enabling this 
option will reduce the cost to list source files which can be an expensive 
operation.
+Number of threads used in completed file cleaner can be configured 
withspark.sql.streaming.fileSource.cleaner.numThreads (default: 
1).
 NOTE 2: The source path should not be used from multiple sources or 
queries when enabling this option. Similarly, you must ensure the source path 
doesn't match to any files in output directory of file stream sink.
-NOTE 3: Both delete and move actions are best effort. Failing to 
delete or move files will not fail the streaming query.
+NOTE 3: Both delete and move actions are best effort. Failing to 
delete or move files will not fail the streaming query. Spark may not clean up 
some source files in some circumstances - e.g. the application doesn't shut 
down gracefully, too many files are queued to clean up.
 
 For file-format-specific options, see the related methods in 
DataStreamReader
 (Scala/Java/Python/ 0) {
+logDebug(s"Cleaning file source on $numThreads separate thread(s)")
+
Some(ThreadUtils.newDaemonCachedThreadPool("file-source-cleaner-threadpool", 
numThreads))
+  } else {
+logDebug("Cleaning file source on ma

[spark] branch master updated (fd308ad -> 830e635)

2020-01-17 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fd308ad  [SPARK-30041][SQL][WEBUI] Add Codegen Stage Id to Stage DAG 
visualization in Web UI
 add 830e635  [SPARK-27868][CORE][FOLLOWUP] Recover the default value to -1 
again

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/spark/network/util/TransportConf.java| 8 ++--
 docs/configuration.md | 5 +++--
 2 files changed, 9 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Commented] (SPARK-27868) Better document shuffle / RPC listen backlog

2020-01-17 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018219#comment-17018219
 ] 

Marcelo Masiero Vanzin commented on SPARK-27868:


It's ok for now, it's done. Hopefully 3.0 will come out soon and the "main" 
documentation on the site will have the info.

> Better document shuffle / RPC listen backlog
> 
>
> Key: SPARK-27868
> URL: https://issues.apache.org/jira/browse/SPARK-27868
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Spark Core
>Affects Versions: 2.4.3
>Reporter: Marcelo Masiero Vanzin
>    Assignee: Marcelo Masiero Vanzin
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> The option to control the listen socket backlog for RPC and shuffle servers 
> is not documented in our public docs.
> The only piece of documentation is in a Java class, and even that 
> documentation is incorrect:
> {code}
>   /** Requested maximum length of the queue of incoming connections. Default 
> -1 for no backlog. */
>   public int backLog() { return conf.getInt(SPARK_NETWORK_IO_BACKLOG_KEY, 
> -1); }
> {code}
> The default value actual causes the default value from the JRE to be used, 
> which is 50 according to the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29950) Deleted excess executors can connect back to driver in K8S with dyn alloc on

2020-01-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29950:
--

Assignee: Marcelo Masiero Vanzin

> Deleted excess executors can connect back to driver in K8S with dyn alloc on
> 
>
> Key: SPARK-29950
> URL: https://issues.apache.org/jira/browse/SPARK-29950
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>    Assignee: Marcelo Masiero Vanzin
>Priority: Minor
>
> {{ExecutorPodsAllocator}} currently has code to delete excess pods that the 
> K8S server hasn't started yet, and aren't needed anymore due to downscaling.
> The problem is that there is a race between K8S starting the pod and the 
> Spark code deleting it. This may cause the pod to connect back to Spark and 
> do a lot of initialization, sometimes even being considered for task 
> allocation, just to be killed almost immediately.
> This doesn't cause any problems that I could detect in my tests, but wastes 
> resources, and causes logs to contains misleading messages about the executor 
> being killed. It would be nice to avoid that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated: [SPARK-29950][K8S] Blacklist deleted executors in K8S with dynamic allocation

2020-01-16 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dca8380  [SPARK-29950][K8S] Blacklist deleted executors in K8S with 
dynamic allocation
dca8380 is described below

commit dca838058ffd0e2c01591fd9ab0f192de446d606
Author: Marcelo Vanzin 
AuthorDate: Thu Jan 16 13:37:11 2020 -0800

[SPARK-29950][K8S] Blacklist deleted executors in K8S with dynamic 
allocation

The issue here is that when Spark is downscaling the application and deletes
a few pod requests that aren't needed anymore, it may actually race with the
K8S scheduler, who may be bringing up those executors. So they may have 
enough
time to connect back to the driver, register, to just be deleted soon after.
This wastes resources and causes misleading entries in the driver log.

The change (ab)uses the blacklisting mechanism to consider the deleted 
excess
pods as blacklisted, so that if they try to connect back, the driver will 
deny
it.

It also changes the executor registration slightly, since even with the 
above
change there were misleading logs. That was because the executor 
registration
message was an RPC that always succeeded (bar network issues), so the 
executor
would always try to send an unregistration message to the driver, which 
would
then log several messages about not knowing anything about the executor. The
change makes the registration RPC succeed or fail directly, instead of using
the separate failure message that would lead to this issue.

Note the last change required some changes in a standalone test suite 
related
to dynamic allocation, since it relied on the driver not throwing exceptions
when a duplicate executor registration happened.

Tested with existing unit tests, and with live cluster with dyn alloc on.

Closes #26586 from vanzin/SPARK-29950.

Authored-by: Marcelo Vanzin 
Signed-off-by: Marcelo Vanzin 
---
 .../executor/CoarseGrainedExecutorBackend.scala| 14 +++--
 .../cluster/CoarseGrainedClusterMessage.scala  |  7 ---
 .../cluster/CoarseGrainedSchedulerBackend.scala| 19 +--
 .../deploy/StandaloneDynamicAllocationSuite.scala  | 65 ++
 .../CoarseGrainedSchedulerBackendSuite.scala   |  1 +
 .../cluster/k8s/ExecutorPodsAllocator.scala| 18 ++
 .../k8s/KubernetesClusterSchedulerBackend.scala|  4 ++
 .../DeterministicExecutorPodsSnapshotsStore.scala  |  9 +++
 .../cluster/k8s/ExecutorPodsAllocatorSuite.scala   | 11 
 9 files changed, 105 insertions(+), 43 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
index b1837c9..1fe901a 100644
--- 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
@@ -54,6 +54,8 @@ private[spark] class CoarseGrainedExecutorBackend(
 resourcesFileOpt: Option[String])
   extends IsolatedRpcEndpoint with ExecutorBackend with Logging {
 
+  import CoarseGrainedExecutorBackend._
+
   private implicit val formats = DefaultFormats
 
   private[this] val stopping = new AtomicBoolean(false)
@@ -80,9 +82,8 @@ private[spark] class CoarseGrainedExecutorBackend(
   ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, 
extractLogUrls,
 extractAttributes, resources))
 }(ThreadUtils.sameThread).onComplete {
-  // This is a very fast action so we can use "ThreadUtils.sameThread"
-  case Success(msg) =>
-// Always receive `true`. Just ignore it
+  case Success(_) =>
+self.send(RegisteredExecutor)
   case Failure(e) =>
 exitExecutor(1, s"Cannot register with driver: $driverUrl", e, 
notifyDriver = false)
 }(ThreadUtils.sameThread)
@@ -133,9 +134,6 @@ private[spark] class CoarseGrainedExecutorBackend(
   exitExecutor(1, "Unable to create executor due to " + e.getMessage, 
e)
   }
 
-case RegisterExecutorFailed(message) =>
-  exitExecutor(1, "Slave registration failed: " + message)
-
 case LaunchTask(data) =>
   if (executor == null) {
 exitExecutor(1, "Received LaunchTask command but executor was null")
@@ -226,6 +224,10 @@ private[spark] class CoarseGrainedExecutorBackend(
 
 private[spark] object CoarseGrainedExecutorBackend extends Logging {
 
+  // Message used internally to start the executor when the driver 
successfully accepted the
+  // registration request.
+  case object RegisteredExecutor
+
   case class Arguments(
   driverUrl: String,
   executorId: String,
dif

[jira] [Resolved] (SPARK-29950) Deleted excess executors can connect back to driver in K8S with dyn alloc on

2020-01-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29950.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26586
[https://github.com/apache/spark/pull/26586]

> Deleted excess executors can connect back to driver in K8S with dyn alloc on
> 
>
> Key: SPARK-29950
> URL: https://issues.apache.org/jira/browse/SPARK-29950
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>    Assignee: Marcelo Masiero Vanzin
>Priority: Minor
> Fix For: 3.0.0
>
>
> {{ExecutorPodsAllocator}} currently has code to delete excess pods that the 
> K8S server hasn't started yet, and aren't needed anymore due to downscaling.
> The problem is that there is a race between K8S starting the pod and the 
> Spark code deleting it. This may cause the pod to connect back to Spark and 
> do a lot of initialization, sometimes even being considered for task 
> allocation, just to be killed almost immediately.
> This doesn't cause any problems that I could detect in my tests, but wastes 
> resources, and causes logs to contains misleading messages about the executor 
> being killed. It would be nice to avoid that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27868) Better document shuffle / RPC listen backlog

2020-01-16 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017281#comment-17017281
 ] 

Marcelo Masiero Vanzin commented on SPARK-27868:


You shoudn't have reverted the whole change. The documentation and extra 
logging are still really useful.

> Better document shuffle / RPC listen backlog
> 
>
> Key: SPARK-27868
> URL: https://issues.apache.org/jira/browse/SPARK-27868
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Spark Core
>Affects Versions: 2.4.3
>Reporter: Marcelo Masiero Vanzin
>    Assignee: Marcelo Masiero Vanzin
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> The option to control the listen socket backlog for RPC and shuffle servers 
> is not documented in our public docs.
> The only piece of documentation is in a Java class, and even that 
> documentation is incorrect:
> {code}
>   /** Requested maximum length of the queue of incoming connections. Default 
> -1 for no backlog. */
>   public int backLog() { return conf.getInt(SPARK_NETWORK_IO_BACKLOG_KEY, 
> -1); }
> {code}
> The default value actual causes the default value from the JRE to be used, 
> which is 50 according to the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30246) Spark on Yarn External Shuffle Service Memory Leak

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin updated SPARK-30246:
---
Fix Version/s: (was: 2.4.5)
   2.4.6

> Spark on Yarn External Shuffle Service Memory Leak
> --
>
> Key: SPARK-30246
> URL: https://issues.apache.org/jira/browse/SPARK-30246
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.4.3
> Environment: hadoop 2.7.3
> spark 2.4.3
> jdk 1.8.0_60
>Reporter: huangweiyi
>Assignee: Henrique dos Santos Goulart
>Priority: Major
> Fix For: 3.0.0, 2.4.6
>
>
> In our large busy yarn cluster which deploy Spark external shuffle service as 
> part of YARN NM aux service, we encountered OOM in some NMs.
> after i dump the heap memory and found there are some StremState objects 
> still in heap, but the app which the StreamState belongs to is already 
> finished.
> Here is some relate Figures:
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_oom.png|width=100%!
> The heap dump below shows that the memory consumption mainly consists of two 
> parts:
> *(1) OneForOneStreamManager (4,429,796,424 (77.11%) bytes)*
> *(2) PoolChunk(occupy 1,059,201,712 (18.44%) bytes. )*
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_heap_overview.png|width=100%!
> dig into the OneForOneStreamManager, there are some StreaStates still 
> remained :
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/streamState.png|width=100%!
> incomming references to StreamState::associatedChannel: 
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/associatedChannel_incomming_reference.png|width=100%!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30246) Spark on Yarn External Shuffle Service Memory Leak

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30246:
--

Assignee: Henrique dos Santos Goulart

> Spark on Yarn External Shuffle Service Memory Leak
> --
>
> Key: SPARK-30246
> URL: https://issues.apache.org/jira/browse/SPARK-30246
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.4.3
> Environment: hadoop 2.7.3
> spark 2.4.3
> jdk 1.8.0_60
>Reporter: huangweiyi
>Assignee: Henrique dos Santos Goulart
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> In our large busy yarn cluster which deploy Spark external shuffle service as 
> part of YARN NM aux service, we encountered OOM in some NMs.
> after i dump the heap memory and found there are some StremState objects 
> still in heap, but the app which the StreamState belongs to is already 
> finished.
> Here is some relate Figures:
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_oom.png|width=100%!
> The heap dump below shows that the memory consumption mainly consists of two 
> parts:
> *(1) OneForOneStreamManager (4,429,796,424 (77.11%) bytes)*
> *(2) PoolChunk(occupy 1,059,201,712 (18.44%) bytes. )*
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_heap_overview.png|width=100%!
> dig into the OneForOneStreamManager, there are some StreaStates still 
> remained :
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/streamState.png|width=100%!
> incomming references to StreamState::associatedChannel: 
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/associatedChannel_incomming_reference.png|width=100%!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30246) Spark on Yarn External Shuffle Service Memory Leak

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30246.

Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 27064
[https://github.com/apache/spark/pull/27064]

> Spark on Yarn External Shuffle Service Memory Leak
> --
>
> Key: SPARK-30246
> URL: https://issues.apache.org/jira/browse/SPARK-30246
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.4.3
> Environment: hadoop 2.7.3
> spark 2.4.3
> jdk 1.8.0_60
>Reporter: huangweiyi
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> In our large busy yarn cluster which deploy Spark external shuffle service as 
> part of YARN NM aux service, we encountered OOM in some NMs.
> after i dump the heap memory and found there are some StremState objects 
> still in heap, but the app which the StreamState belongs to is already 
> finished.
> Here is some relate Figures:
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_oom.png|width=100%!
> The heap dump below shows that the memory consumption mainly consists of two 
> parts:
> *(1) OneForOneStreamManager (4,429,796,424 (77.11%) bytes)*
> *(2) PoolChunk(occupy 1,059,201,712 (18.44%) bytes. )*
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_heap_overview.png|width=100%!
> dig into the OneForOneStreamManager, there are some StreaStates still 
> remained :
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/streamState.png|width=100%!
> incomming references to StreamState::associatedChannel: 
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/associatedChannel_incomming_reference.png|width=100%!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (6c178a5 -> d42cf45)

2020-01-15 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6c178a5  [SPARK-30495][SS] Consider 
spark.security.credentials.kafka.enabled and cluster configuration when 
checking latest delegation token
 add d42cf45  [SPARK-30246][CORE] OneForOneStreamManager might leak memory 
in connectionTerminated

No new revisions were added by this update.

Summary of changes:
 .../network/server/OneForOneStreamManager.java | 24 ++---
 .../server/OneForOneStreamManagerSuite.java| 39 ++
 2 files changed, 58 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Resolved] (SPARK-30495) How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30495.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27191
[https://github.com/apache/spark/pull/27191]

> How to disable 'spark.security.credentials.${service}.enabled' in Structured 
> streaming while connecting to a kafka cluster
> --
>
> Key: SPARK-30495
> URL: https://issues.apache.org/jira/browse/SPARK-30495
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: act_coder
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> Trying to read data from a secured Kafka cluster using spark structured
>  streaming. Also, using the below library to read the data -
>  +*"spark-sql-kafka-0-10_2.12":"3.0.0-preview"*+ since it has the feature to
>  specify our custom group id (instead of spark setting its own custom group
>  id)
> +*Dependency used in code:*+
>         org.apache.spark
>          spark-sql-kafka-0-10_2.12
>          3.0.0-preview
>  
> +*Logs:*+
> Getting the below error - even after specifying the required JAAS
>  configuration in spark options.
> Caused by: java.lang.IllegalArgumentException: requirement failed:
>  *Delegation token must exist for this connector*. at
>  scala.Predef$.require(Predef.scala:281) at
> org.apache.spark.kafka010.KafkaTokenUtil$.isConnectorUsingCurrentToken(KafkaTokenUtil.scala:299)
>  at
>  
> org.apache.spark.sql.kafka010.KafkaDataConsumer.getOrRetrieveConsumer(KafkaDataConsumer.scala:533)
>  at
>  
> org.apache.spark.sql.kafka010.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:275)
>  
> +*Spark configuration used to read from Kafka:*+
> val kafkaDF = sparkSession.readStream
>  .format("kafka")
>  .option("kafka.bootstrap.servers", bootStrapServer)
>  .option("subscribe", kafkaTopic )
>  
> //Setting JAAS Configuration
> .option("kafka.sasl.jaas.config", KAFKA_JAAS_SASL)
>  .option("kafka.sasl.mechanism", "PLAIN")
>  .option("kafka.security.protocol", "SASL_SSL")
> // Setting custom consumer group id
> .option("kafka.group.id", "test_cg")
>  .load()
>  
> Following document specifies that we can disable the feature of obtaining
>  delegation token -
>  
> [https://spark.apache.org/docs/3.0.0-preview/structured-streaming-kafka-integration.html]
> Tried setting this property *spark.security.credentials.kafka.enabled to*
>  *false in spark config,* but it is still failing with the same error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30495) How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30495:
--

Assignee: Gabor Somogyi

> How to disable 'spark.security.credentials.${service}.enabled' in Structured 
> streaming while connecting to a kafka cluster
> --
>
> Key: SPARK-30495
> URL: https://issues.apache.org/jira/browse/SPARK-30495
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: act_coder
>Assignee: Gabor Somogyi
>Priority: Major
>
> Trying to read data from a secured Kafka cluster using spark structured
>  streaming. Also, using the below library to read the data -
>  +*"spark-sql-kafka-0-10_2.12":"3.0.0-preview"*+ since it has the feature to
>  specify our custom group id (instead of spark setting its own custom group
>  id)
> +*Dependency used in code:*+
>         org.apache.spark
>          spark-sql-kafka-0-10_2.12
>          3.0.0-preview
>  
> +*Logs:*+
> Getting the below error - even after specifying the required JAAS
>  configuration in spark options.
> Caused by: java.lang.IllegalArgumentException: requirement failed:
>  *Delegation token must exist for this connector*. at
>  scala.Predef$.require(Predef.scala:281) at
> org.apache.spark.kafka010.KafkaTokenUtil$.isConnectorUsingCurrentToken(KafkaTokenUtil.scala:299)
>  at
>  
> org.apache.spark.sql.kafka010.KafkaDataConsumer.getOrRetrieveConsumer(KafkaDataConsumer.scala:533)
>  at
>  
> org.apache.spark.sql.kafka010.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:275)
>  
> +*Spark configuration used to read from Kafka:*+
> val kafkaDF = sparkSession.readStream
>  .format("kafka")
>  .option("kafka.bootstrap.servers", bootStrapServer)
>  .option("subscribe", kafkaTopic )
>  
> //Setting JAAS Configuration
> .option("kafka.sasl.jaas.config", KAFKA_JAAS_SASL)
>  .option("kafka.sasl.mechanism", "PLAIN")
>  .option("kafka.security.protocol", "SASL_SSL")
> // Setting custom consumer group id
> .option("kafka.group.id", "test_cg")
>  .load()
>  
> Following document specifies that we can disable the feature of obtaining
>  delegation token -
>  
> [https://spark.apache.org/docs/3.0.0-preview/structured-streaming-kafka-integration.html]
> Tried setting this property *spark.security.credentials.kafka.enabled to*
>  *false in spark config,* but it is still failing with the same error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (e751bc6 -> 6c178a5)

2020-01-15 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e751bc6  [SPARK-30479][SQL] Apply compaction of event log to SQL events
 add 6c178a5  [SPARK-30495][SS] Consider 
spark.security.credentials.kafka.enabled and cluster configuration when 
checking latest delegation token

No new revisions were added by this update.

Summary of changes:
 .../security/HadoopDelegationTokenManager.scala| 61 ++-
 .../sql/kafka010/consumer/KafkaDataConsumer.scala  |  2 +-
 .../org/apache/spark/kafka010/KafkaTokenUtil.scala | 14 +++--
 .../spark/kafka010/KafkaTokenUtilSuite.scala   | 69 +-
 4 files changed, 83 insertions(+), 63 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (990a2be -> e751bc6)

2020-01-15 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 990a2be  [SPARK-30378][ML][PYSPARK][FOLLOWUP] Remove Param fields 
provided by _FactorizationMachinesParams
 add e751bc6  [SPARK-30479][SQL] Apply compaction of event log to SQL events

No new revisions were added by this update.

Summary of changes:
 .../spark/status/ListenerEventsTestHelper.scala|  47 +++
 apache.spark.deploy.history.EventFilterBuilder |   1 +
 .../execution/history/SQLEventFilterBuilder.scala  | 147 +
 .../history/SQLEventFilterBuilderSuite.scala   | 107 +++
 .../history/SQLLiveEntitiesEventFilterSuite.scala  | 135 +++
 5 files changed, 437 insertions(+)
 create mode 100644 
sql/core/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/history/SQLEventFilterBuilder.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/history/SQLEventFilterBuilderSuite.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/history/SQLLiveEntitiesEventFilterSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (990a2be -> e751bc6)

2020-01-15 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 990a2be  [SPARK-30378][ML][PYSPARK][FOLLOWUP] Remove Param fields 
provided by _FactorizationMachinesParams
 add e751bc6  [SPARK-30479][SQL] Apply compaction of event log to SQL events

No new revisions were added by this update.

Summary of changes:
 .../spark/status/ListenerEventsTestHelper.scala|  47 +++
 apache.spark.deploy.history.EventFilterBuilder |   1 +
 .../execution/history/SQLEventFilterBuilder.scala  | 147 +
 .../history/SQLEventFilterBuilderSuite.scala   | 107 +++
 .../history/SQLLiveEntitiesEventFilterSuite.scala  | 135 +++
 5 files changed, 437 insertions(+)
 create mode 100644 
sql/core/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/history/SQLEventFilterBuilder.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/history/SQLEventFilterBuilderSuite.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/history/SQLLiveEntitiesEventFilterSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Resolved] (SPARK-30479) Apply compaction of event log to SQL events

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30479.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27164
[https://github.com/apache/spark/pull/27164]

> Apply compaction of event log to SQL events
> ---
>
> Key: SPARK-30479
> URL: https://issues.apache.org/jira/browse/SPARK-30479
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> This issue depends on SPARK-29779 and focuses on dealing with SQL events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30479) Apply compaction of event log to SQL events

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30479:
--

Assignee: Jungtaek Lim

> Apply compaction of event log to SQL events
> ---
>
> Key: SPARK-30479
> URL: https://issues.apache.org/jira/browse/SPARK-30479
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> This issue depends on SPARK-29779 and focuses on dealing with SQL events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (9320011 -> 0c6bd3b)

2020-01-14 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9320011  [SPARK-9478][ML][PYSPARK] Add sample weights to Random Forest
 add 0c6bd3b  [SPARK-27142][SQL] Provide REST API for SQL information

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/ui/SQLAppStatusStore.scala |   4 +
 .../status/api/v1/sql/ApiSqlRootResource.scala |  16 ++--
 .../spark/status/api/v1/sql/SqlResource.scala  | 101 +
 .../org/apache/spark/status/api/v1/sql/api.scala   |  20 +++-
 4 files changed, 127 insertions(+), 14 deletions(-)
 copy common/unsafe/src/main/java/org/apache/spark/unsafe/KVIterator.java => 
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/ApiSqlRootResource.scala
 (73%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
 copy core/src/main/scala/org/apache/spark/metrics/sink/package.scala => 
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/api.scala (65%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9320011 -> 0c6bd3b)

2020-01-14 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9320011  [SPARK-9478][ML][PYSPARK] Add sample weights to Random Forest
 add 0c6bd3b  [SPARK-27142][SQL] Provide REST API for SQL information

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/ui/SQLAppStatusStore.scala |   4 +
 .../status/api/v1/sql/ApiSqlRootResource.scala |  16 ++--
 .../spark/status/api/v1/sql/SqlResource.scala  | 101 +
 .../org/apache/spark/status/api/v1/sql/api.scala   |  20 +++-
 4 files changed, 127 insertions(+), 14 deletions(-)
 copy common/unsafe/src/main/java/org/apache/spark/unsafe/KVIterator.java => 
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/ApiSqlRootResource.scala
 (73%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
 copy core/src/main/scala/org/apache/spark/metrics/sink/package.scala => 
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/api.scala (65%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Assigned] (SPARK-27142) Provide REST API for SQL level information

2020-01-14 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-27142:
--

Assignee: Ajith S

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found
>  
> Details: 
> https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27142) Provide REST API for SQL level information

2020-01-14 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-27142.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 24076
[https://github.com/apache/spark/pull/24076]

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found
>  
> Details: 
> https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (2bd8731 -> 7fb17f59)

2020-01-10 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2bd8731  [SPARK-30468][SQL] Use multiple lines to display data columns 
for show create table command
 add 7fb17f59 [SPARK-29779][CORE] Compact old event log files and cleanup

No new revisions were added by this update.

Summary of changes:
 apache.spark.deploy.history.EventFilterBuilder |   1 +
 .../deploy/history/BasicEventFilterBuilder.scala   | 176 +++
 .../apache/spark/deploy/history/EventFilter.scala  | 109 +++
 .../deploy/history/EventLogFileCompactor.scala | 224 ++
 .../spark/deploy/history/EventLogFileReaders.scala |  28 +-
 .../spark/deploy/history/EventLogFileWriters.scala |  28 +-
 .../org/apache/spark/internal/config/package.scala |  18 ++
 .../history/BasicEventFilterBuilderSuite.scala | 228 ++
 .../deploy/history/BasicEventFilterSuite.scala | 208 +
 .../history/EventLogFileCompactorSuite.scala   | 326 +
 .../deploy/history/EventLogFileReadersSuite.scala  |   6 +-
 .../deploy/history/EventLogFileWritersSuite.scala  |   4 +-
 .../spark/deploy/history/EventLogTestHelper.scala  |  55 +++-
 .../spark/status/AppStatusListenerSuite.scala  |  38 +--
 .../spark/status/ListenerEventsTestHelper.scala| 154 ++
 15 files changed, 1545 insertions(+), 58 deletions(-)
 create mode 100644 
core/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/BasicEventFilterBuilder.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/EventFilter.scala
 create mode 100644 
core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterBuilderSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/status/ListenerEventsTestHelper.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Resolved] (SPARK-29779) Compact old event log files and clean up

2020-01-10 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29779.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27085
[https://github.com/apache/spark/pull/27085]

> Compact old event log files and clean up
> 
>
> Key: SPARK-29779
> URL: https://issues.apache.org/jira/browse/SPARK-29779
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> Please note that this issue leaves below functionalities for future JIRA 
> issue as the patch for SPARK-29779 is too huge and we decided to break down.
>  * apply filter in SQL events
>  * integrate compaction into FsHistoryProvider
>  * documentation about new configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29779) Compact old event log files and clean up

2020-01-10 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29779:
--

Assignee: Jungtaek Lim

> Compact old event log files and clean up
> 
>
> Key: SPARK-29779
> URL: https://issues.apache.org/jira/browse/SPARK-29779
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> Please note that this issue leaves below functionalities for future JIRA 
> issue as the patch for SPARK-29779 is too huge and we decided to break down.
>  * apply filter in SQL events
>  * integrate compaction into FsHistoryProvider
>  * documentation about new configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30281) 'archive' option in FileStreamSource misses to consider partitioned and recursive option

2020-01-08 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30281:
--

Assignee: Jungtaek Lim

> 'archive' option in FileStreamSource misses to consider partitioned and 
> recursive option
> 
>
> Key: SPARK-30281
> URL: https://issues.apache.org/jira/browse/SPARK-30281
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> Cleanup option for FileStreamSource is introduced in SPARK-20568.
> To simplify the condition of verifying archive path, it took the fact that 
> FileStreamSource reads the files where these files meet one of conditions: 1) 
> parent directory matches the source pattern 2) the file itself matches the 
> source pattern.
> We found there're other cases during post-hoc review which invalidate above 
> fact: partitioned, and recursive option. With these options, FileStreamSource 
> can read the arbitrary files in subdirectories which match the source 
> pattern, so simply checking the depth of archive path doesn't work.
> We need to restore the path check logic, though it would be not easy to 
> explain to end users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30281) 'archive' option in FileStreamSource misses to consider partitioned and recursive option

2020-01-08 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30281.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26920
[https://github.com/apache/spark/pull/26920]

> 'archive' option in FileStreamSource misses to consider partitioned and 
> recursive option
> 
>
> Key: SPARK-30281
> URL: https://issues.apache.org/jira/browse/SPARK-30281
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Cleanup option for FileStreamSource is introduced in SPARK-20568.
> To simplify the condition of verifying archive path, it took the fact that 
> FileStreamSource reads the files where these files meet one of conditions: 1) 
> parent directory matches the source pattern 2) the file itself matches the 
> source pattern.
> We found there're other cases during post-hoc review which invalidate above 
> fact: partitioned, and recursive option. With these options, FileStreamSource 
> can read the arbitrary files in subdirectories which match the source 
> pattern, so simply checking the depth of archive path doesn't work.
> We need to restore the path check logic, though it would be not easy to 
> explain to end users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (0a72dba -> bd7510b)

2020-01-08 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0a72dba  [SPARK-30445][CORE] Accelerator aware scheduling handle 
setting configs to 0
 add bd7510b  [SPARK-30281][SS] Consider partitioned/recursive option while 
verifying archive path on FileStreamSource

No new revisions were added by this update.

Summary of changes:
 docs/structured-streaming-programming-guide.md |  3 +-
 .../sql/execution/streaming/FileStreamSource.scala | 73 +-
 .../sql/streaming/FileStreamSourceSuite.scala  | 26 ++--
 3 files changed, 80 insertions(+), 22 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Assigned] (SPARK-30313) Flaky test: MasterSuite.master/worker web ui available with reverseProxy

2020-01-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30313:
--

Assignee: Jungtaek Lim

> Flaky test: MasterSuite.master/worker web ui available with reverseProxy
> 
>
> Key: SPARK-30313
> URL: https://issues.apache.org/jira/browse/SPARK-30313
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
>
> Saw this test fail a few times on PRs. e.g.:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115583/testReport/org.apache.spark.deploy.master/MasterSuite/master_worker_web_ui_available_with_reverseProxy/]
>  
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
> Stacktrace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at 
> org.apache.spark.deploy.master.MasterSuite.$anonfun$new$14(MasterSuite.scala:318)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   ---
> Caused by: sbt.ForkMain$ForkError: java.io.IOException: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
>   at java.net.URL.openStream(URL.java:1045)
>   at scala.io.Source$.fromURL(Source.scala:144)
>   at scala.io.Source$.fromURL(Source.scala:134)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30313) Flaky test: MasterSuite.master/worker web ui available with reverseProxy

2020-01-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30313.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27010
[https://github.com/apache/spark/pull/27010]

> Flaky test: MasterSuite.master/worker web ui available with reverseProxy
> 
>
> Key: SPARK-30313
> URL: https://issues.apache.org/jira/browse/SPARK-30313
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Saw this test fail a few times on PRs. e.g.:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115583/testReport/org.apache.spark.deploy.master/MasterSuite/master_worker_web_ui_available_with_reverseProxy/]
>  
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
> Stacktrace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at 
> org.apache.spark.deploy.master.MasterSuite.$anonfun$new$14(MasterSuite.scala:318)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   ---
> Caused by: sbt.ForkMain$ForkError: java.io.IOException: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
>   at java.net.URL.openStream(URL.java:1045)
>   at scala.io.Source$.fromURL(Source.scala:144)
>   at scala.io.Source$.fromURL(Source.scala:134)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (604d679 -> 895e572)

2020-01-06 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 604d679  [SPARK-30226][SQL] Remove withXXX functions in WriteBuilder
 add 895e572  [SPARK-30313][CORE] Ensure EndpointRef is available 
MasterWebUI/WorkerPage

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/rpc/netty/Dispatcher.scala| 35 ++
 1 file changed, 23 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2020-01-02 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 16f8fae  [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop 
and AsyncEventQueue#removeListenerOnError
16f8fae is described below

commit 16f8fae01f329d4ba5786176c3c8dc4e648a8c22
Author: Wang Shuo 
AuthorDate: Thu Jan 2 16:40:22 2020 -0800

[SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and 
AsyncEventQueue#removeListenerOnError

There is a deadlock between `LiveListenerBus#stop` and 
`AsyncEventQueue#removeListenerOnError`.

We can reproduce as follows:

1. Post some events to `LiveListenerBus`
2. Call `LiveListenerBus#stop` and hold the synchronized lock of 
`bus`(https://github.com/apache/spark/blob/5e92301723464d0876b5a7eec59c15fed0c5b98c/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L229),
 waiting until all the events are processed by listeners, then remove all the 
queues
3. Event queue would drain out events by posting to its listeners. If a 
listener is interrupted, it will call `AsyncEventQueue#removeListenerOnError`,  
inside it will call 
`bus.removeListener`(https://github.com/apache/spark/blob/7b1b60c7583faca70aeab2659f06d4e491efa5c0/core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala#L207),
 trying to acquire synchronized lock of bus, resulting in deadlock

This PR  removes the `synchronized` from `LiveListenerBus.stop` because 
underlying data structures themselves are thread-safe.

To fix deadlock.

No.

New UT.

Closes #26924 from wangshuo128/event-queue-race-condition.

Authored-by: Wang Shuo 
Signed-off-by: Marcelo Vanzin 
(cherry picked from commit 10cae04108c375a7f5ca7685fea593bd7f49f7a6)
Signed-off-by: Marcelo Vanzin 
---
 .../apache/spark/scheduler/LiveListenerBus.scala   |  6 +-
 .../spark/scheduler/SparkListenerSuite.scala   | 70 ++
 2 files changed, 72 insertions(+), 4 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala 
b/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala
index d135190..1f42f09 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala
@@ -215,10 +215,8 @@ private[spark] class LiveListenerBus(conf: SparkConf) {
   return
 }
 
-synchronized {
-  queues.asScala.foreach(_.stop())
-  queues.clear()
-}
+queues.asScala.foreach(_.stop())
+queues.clear()
   }
 
   // For testing only.
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
index 6ffd1e8..0b843be 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
@@ -531,6 +531,47 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
   }
 
+  Seq(true, false).foreach { throwInterruptedException =>
+val suffix = if (throwInterruptedException) "throw interrupt" else "set 
Thread interrupted"
+test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: 
$suffix") {
+  val LISTENER_BUS_STOP_WAITING_TIMEOUT_MILLIS = 10 * 1000L // 10 seconds
+  val bus = new LiveListenerBus(new SparkConf(false))
+  val counter1 = new BasicJobCounter()
+  val counter2 = new BasicJobCounter()
+  val interruptingListener = new 
DelayInterruptingJobCounter(throwInterruptedException, 3)
+  bus.addToSharedQueue(counter1)
+  bus.addToSharedQueue(interruptingListener)
+  bus.addToEventLogQueue(counter2)
+  assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE))
+  assert(bus.findListenersByClass[BasicJobCounter]().size === 2)
+  assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 
1)
+
+  bus.start(mockSparkContext, mockMetricsSystem)
+
+  (0 until 5).foreach { jobId =>
+bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded))
+  }
+
+  // Call bus.stop in a separate thread, otherwise we will block here 
until bus is stopped
+  val stoppingThread = new Thread(new Runnable() {
+override def run(): Unit = bus.stop()
+  })
+  stoppingThread.start()
+  // Notify interrupting listener starts to work
+  interruptingListener.sleep = false
+  // Wait for bus to stop
+  stoppingThread.join(LISTENER_BUS_STOP_WAITING_TIMEOUT_MILLIS)
+
+  // Stopping has been finished
+  assert(stoppingThread.isAlive === false)
+  // All queues are removed
+  assert(bus.activeQ

[jira] [Assigned] (SPARK-30285) Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2020-01-02 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30285:
--

Assignee: Wang Shuo

> Fix deadlock between LiveListenerBus#stop and 
> AsyncEventQueue#removeListenerOnError
> ---
>
> Key: SPARK-30285
> URL: https://issues.apache.org/jira/browse/SPARK-30285
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Wang Shuo
>Assignee: Wang Shuo
>Priority: Major
>
> There is a deadlock between LiveListenerBus#stop and 
> AsyncEventQueue#removeListenerOnError.
> we can reproduce as follows:
>  # Post some events to LiveListenerBus
>  # Call LiveListenerBus#stop and hold the synchronized lock of bus, waiting 
> until all the events are processed by listeners, then remove all the queues
>  # Event queue would drain out events by posting to its listeners. If a 
> listener is interrupted, it will call AsyncEventQueue#removeListenerOnError,  
> inside it will call bus.removeListener, trying to acquire synchronized lock 
> of bus, resulting in deadlock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30285) Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2020-01-02 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30285.

Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26924
[https://github.com/apache/spark/pull/26924]

> Fix deadlock between LiveListenerBus#stop and 
> AsyncEventQueue#removeListenerOnError
> ---
>
> Key: SPARK-30285
> URL: https://issues.apache.org/jira/browse/SPARK-30285
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Wang Shuo
>Assignee: Wang Shuo
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> There is a deadlock between LiveListenerBus#stop and 
> AsyncEventQueue#removeListenerOnError.
> we can reproduce as follows:
>  # Post some events to LiveListenerBus
>  # Call LiveListenerBus#stop and hold the synchronized lock of bus, waiting 
> until all the events are processed by listeners, then remove all the queues
>  # Event queue would drain out events by posting to its listeners. If a 
> listener is interrupted, it will call AsyncEventQueue#removeListenerOnError,  
> inside it will call bus.removeListener, trying to acquire synchronized lock 
> of bus, resulting in deadlock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[spark] branch master updated (1b0570c -> 10cae041)

2020-01-02 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1b0570c  [SPARK-30387] Improving stop hook log message
 add 10cae041 [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop 
and AsyncEventQueue#removeListenerOnError

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/LiveListenerBus.scala   |  6 +-
 .../spark/scheduler/SparkListenerSuite.scala   | 70 ++
 2 files changed, 72 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1b0570c -> 10cae041)

2020-01-02 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1b0570c  [SPARK-30387] Improving stop hook log message
 add 10cae041 [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop 
and AsyncEventQueue#removeListenerOnError

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/LiveListenerBus.scala   |  6 +-
 .../spark/scheduler/SparkListenerSuite.scala   | 70 ++
 2 files changed, 72 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[jira] [Commented] (SPARK-30225) "Stream is corrupted at" exception on reading disk-spilled data of a shuffle operation

2019-12-30 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005838#comment-17005838
 ] 

Marcelo Masiero Vanzin commented on SPARK-30225:


The changes in SPARK-23366 may not necessarily have caused this. But that 
change actually flipped the configuration's default value from false to true; 
so 2.3 has the feature disabled by default, and 2.4 has it enabled by default. 
So the bug may have existed in the 2.3 version of the code, making it a bit 
harder to track. Still taking a look at the code but nothing popped up yet...

> "Stream is corrupted at" exception on reading disk-spilled data of a shuffle 
> operation
> --
>
> Key: SPARK-30225
> URL: https://issues.apache.org/jira/browse/SPARK-30225
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>
> There is issues with spark.unsafe.sorter.spill.read.ahead.enabled in spark 
> 2.4.0, which is introduced by 
> https://issues.apache.org/jira/browse/SPARK-23366
>  
> Workaround for this problem is to disable readahead of unsafe spill with 
> following.
>  --conf spark.unsafe.sorter.spill.read.ahead.enabled=false
>  
> This issue can be reproduced on Spark 2.4.0 by following the steps in this 
> comment of Jira SPARK-18105.
> https://issues.apache.org/jira/browse/SPARK-18105?focusedCommentId=16981461=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16981461
>  
> Exception looks like below: 
> {code:java}
> 19/12/10 01:51:31 INFO sort.ShuffleExternalSorter: Thread 142 spilling sort 
> data of 5.1 GB to disk (1  time so far)19/12/10 01:51:31 INFO 
> sort.ShuffleExternalSorter: Thread 142 spilling sort data of 5.1 GB to disk 
> (1  time so far)19/12/10 01:52:48 INFO sort.ShuffleExternalSorter: Thread 142 
> spilling sort data of 5.1 GB to disk (2  times so far)19/12/10 01:53:53 ERROR 
> executor.Executor: Exception in task 6.0 in stage 0.0 (TID 
> 6)java.io.IOException: Stream is corrupted at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:202) at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:228) at 
> net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at 
> org.apache.spark.io.ReadAheadInputStream$1.run(ReadAheadInputStream.java:168) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)19/12/10 01:53:53 INFO 
> executor.CoarseGrainedExecutorBackend: Got assigned task 3319/12/10 01:53:53 
> INFO executor.Executor: Running task 8.1 in stage 0.0 (TID 33)19/12/10 
> 01:54:00 INFO sort.UnsafeExternalSorter: Thread 142 spilling sort data of 3.3 
> GB to disk (0  time so far)19/12/10 01:54:30 INFO executor.Executor: Executor 
> is trying to kill task 8.1 in stage 0.0 (TID 33), reason: Stage 
> cancelled19/12/10 01:54:30 INFO executor.Executor: Executor killed task 8.1 
> in stage 0.0 (TID 33), reason: Stage cancelled19/12/10 01:54:52 INFO 
> executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown{code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data

2019-12-30 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-18105:
--

Assignee: Marcelo Masiero Vanzin  (was: Davies Liu)

> LZ4 failed to decompress a stream of shuffled data
> --
>
> Key: SPARK-18105
> URL: https://issues.apache.org/jira/browse/SPARK-18105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Davies Liu
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
>
> When lz4 is used to compress the shuffle files, it may fail to decompress it 
> as "stream is corrupt"
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in 
> stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted
>   at 
> org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220)
>   at 
> org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:353)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at com.google.common.io.ByteStreams.read(ByteStreams.java:828)
>   at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110)
>   at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> https://github.com/jpountz/lz4-java/issues/89



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data

2019-12-30 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-18105:
--

Assignee: (was: Marcelo Masiero Vanzin)

> LZ4 failed to decompress a stream of shuffled data
> --
>
> Key: SPARK-18105
> URL: https://issues.apache.org/jira/browse/SPARK-18105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Davies Liu
>Priority: Major
>
> When lz4 is used to compress the shuffle files, it may fail to decompress it 
> as "stream is corrupt"
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in 
> stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted
>   at 
> org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220)
>   at 
> org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:353)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at com.google.common.io.ByteStreams.read(ByteStreams.java:828)
>   at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110)
>   at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> https://github.com/jpountz/lz4-java/issues/89



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Issues with Apache Spark tgz file

2019-12-30 Thread Marcelo Vanzin
That first URL is not the file. It's a web page with links to the file
in different mirrors. I just looked at the actual file in one of the
mirrors and it looks fine.

On Mon, Dec 30, 2019 at 1:34 PM rsinghania  wrote:
>
> Hi,
>
> I'm trying to open the file
> https://www.apache.org/dyn/closer.lua/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> downloaded from https://spark.apache.org/downloads.html using wget, and
> getting the following messages:
>
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
>
> It looks like there's something wrong with the original tgz file; its size
> is only 32 KB.
>
> Could one of the developers please have a look?
>
> Thanks very much,
> Rajat
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[spark] branch master updated (c6ab716 -> 7bff2db)

2019-12-23 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6ab716  [SPARK-29224][ML] Implement Factorization Machines as a 
ml-pipeline component
 add 7bff2db  [SPARK-21869][SS] Revise Kafka producer pool to implement 
'expire' correctly

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/kafka010/CachedKafkaProducer.scala   | 128 -
 .../spark/sql/kafka010/KafkaDataWriter.scala   |  23 +--
 .../apache/spark/sql/kafka010/KafkaWriteTask.scala |  20 +-
 .../org/apache/spark/sql/kafka010/package.scala|   7 +
 .../kafka010/producer/CachedKafkaProducer.scala|  27 ++-
 .../producer/InternalKafkaProducerPool.scala   | 206 +
 .../sql/kafka010/CachedKafkaProducerSuite.scala|  77 
 .../apache/spark/sql/kafka010/KafkaSinkSuite.scala |   2 +-
 .../org/apache/spark/sql/kafka010/KafkaTest.scala  |   3 +-
 .../producer/InternalKafkaProducerPoolSuite.scala  | 192 +++
 10 files changed, 449 insertions(+), 236 deletions(-)
 delete mode 100644 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/Rule.scala => 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/producer/CachedKafkaProducer.scala
 (58%)
 create mode 100644 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/producer/InternalKafkaProducerPool.scala
 delete mode 100644 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/CachedKafkaProducerSuite.scala
 create mode 100644 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/producer/InternalKafkaProducerPoolSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c6ab716 -> 7bff2db)

2019-12-23 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6ab716  [SPARK-29224][ML] Implement Factorization Machines as a 
ml-pipeline component
 add 7bff2db  [SPARK-21869][SS] Revise Kafka producer pool to implement 
'expire' correctly

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/kafka010/CachedKafkaProducer.scala   | 128 -
 .../spark/sql/kafka010/KafkaDataWriter.scala   |  23 +--
 .../apache/spark/sql/kafka010/KafkaWriteTask.scala |  20 +-
 .../org/apache/spark/sql/kafka010/package.scala|   7 +
 .../kafka010/producer/CachedKafkaProducer.scala|  27 ++-
 .../producer/InternalKafkaProducerPool.scala   | 206 +
 .../sql/kafka010/CachedKafkaProducerSuite.scala|  77 
 .../apache/spark/sql/kafka010/KafkaSinkSuite.scala |   2 +-
 .../org/apache/spark/sql/kafka010/KafkaTest.scala  |   3 +-
 .../producer/InternalKafkaProducerPoolSuite.scala  | 192 +++
 10 files changed, 449 insertions(+), 236 deletions(-)
 delete mode 100644 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaProducer.scala
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/Rule.scala => 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/producer/CachedKafkaProducer.scala
 (58%)
 create mode 100644 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/producer/InternalKafkaProducerPool.scala
 delete mode 100644 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/CachedKafkaProducerSuite.scala
 create mode 100644 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/producer/InternalKafkaProducerPoolSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >