Re: Network-related environemental problem when running JDBCSuite

2015-10-15 Thread Richard Hillegas

Thanks for everyone's patience with this email thread. I have fixed my
environmental problem and my tests run cleanly now. This seems to be a
problem which afflicts modern JVMs on Mac OSX (and maybe other unix
variants). The following can happen on these platforms:

  InetAddress.getLocalHost().isReachable( 2000 ) == false

If this happens to you, the fix is to add the following line to /etc/hosts:

127.0.0.1   localhost $yourMachineName

where $yourMachineName is the result of the hostname command. For more
information, see
http://stackoverflow.com/questions/1881546/inetaddress-getlocalhost-throws-unknownhostexception

Thanks,
-Rick




Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 11:15:29 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Dev <dev@spark.apache.org>
> Date: 10/15/2015 11:16 AM
> Subject: Re: Network-related environemental problem when running
JDBCSuite

>
> Continuing this lively conversation with myself (hopefully this
> archived thread may be useful to someone else in the future):
>
> I set the following environment variable as recommended by this page:
> http://stackoverflow.com/questions/29906686/failed-to-bind-to-spark-
> master-using-a-remote-cluster-with-two-workers
>
> export SPARK_LOCAL_IP=127.0.0.1
>
> Then I got errors related to booting the metastore_db. So I deleted
> that directory. After that I was able to run spark-shell again.
>
> Now let's see if this hack fixes the tests...
>
>
> Thanks,
> Rick Hillegas
>
>
>
> Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 10:50:55 AM:
>
> > From: Richard Hillegas/San Francisco/IBM@IBMUS
> > To: Richard Hillegas/San Francisco/IBM@IBMUS
> > Cc: Dev <dev@spark.apache.org>
> > Date: 10/15/2015 10:51 AM
> > Subject: Re: Network-related environemental problem when running
JDBCSuite
> >
> > For the record, I get the same error when I simply try to boot the
> > spark shell:
> >
> > bash-3.2$ bin/spark-shell
> > log4j:WARN No appenders could be found for logger
> > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> > log4j:WARN Please initialize the log4j system properly.
> > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> > for more info.
> > Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-
> > repl.properties
> > To adjust logging level use sc.setLogLevel("INFO")
> > Welcome to
> >     __
> >  / __/__  ___ _/ /__
> > _\ \/ _ \/ _ `/ __/  '_/
> >/___/ .__/\_,_/_/ /_/\_\   version 1.6.0-SNAPSHOT
> >   /_/
> >
> > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM,
> Java 1.8.0_60)
> > Type in expressions to have them evaluated.
> > Type :help for more information.
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty 

Network-related environemental problem when running JDBCSuite

2015-10-15 Thread Richard Hillegas


I am seeing what look like environmental errors when I try to run a test on
a clean local branch which has been sync'd to the head of the development
trunk. I would appreciate advice about how to debug or hack around this
problem. For the record, the test ran cleanly last week. This is the
experiment I am running:

# build
mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean
package

# run one suite
mvn -Dhadoop.version=2.4.0 -DwildcardSuites=JDBCSuite

The test bombs out before getting to JDBCSuite. I see this summary at the
end...

[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS
[  2.023 s]
[INFO] Spark Project Test Tags  SUCCESS
[  1.924 s]
[INFO] Spark Project Launcher . SUCCESS
[  5.837 s]
[INFO] Spark Project Networking ... SUCCESS
[ 12.498 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [01:28
min]
[INFO] Spark Project Unsafe ... SUCCESS [01:09
min]
[INFO] Spark Project Core . SUCCESS [02:45
min]
[INFO] Spark Project Bagel  SUCCESS
[ 30.182 s]
[INFO] Spark Project GraphX ... SUCCESS
[ 59.002 s]
[INFO] Spark Project Streaming  FAILURE [06:21
min]
[INFO] Spark Project Catalyst . SKIPPED
[INFO] Spark Project SQL .. SKIPPED
[INFO] Spark Project ML Library ... SKIPPED
[INFO] Spark Project Tools  SKIPPED
[INFO] Spark Project Hive . SKIPPED
[INFO] Spark Project REPL . SKIPPED
[INFO] Spark Project Assembly . SKIPPED
[INFO] Spark Project External Twitter . SKIPPED
[INFO] Spark Project External Flume Sink .. SKIPPED
[INFO] Spark Project External Flume ... SKIPPED
[INFO] Spark Project External Flume Assembly .. SKIPPED
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project External MQTT Assembly ... SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project External Kafka ... SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project External Kafka Assembly .. SKIPPED
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 13:37 min
[INFO] Finished at: 2015-10-15T09:03:06-07:00
[INFO] Final Memory: 69M/793M
[INFO]

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)
on project spark-streaming_2.10: There are test failures.
[ERROR]
[ERROR] Please refer
to /Users/rhillegas/spark/spark/streaming/target/surefire-reports for the
individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvn  -rf :spark-streaming_2.10



>From the logs in streaming/target/surefire-reports, it appears that the
following tests failed...

org.apache.spark.streaming.JavaAPISuite.txt
org.apache.spark.streaming.JavaReceiverAPISuite.txt

...with this error:

java.net.BindException: Failed to bind to: /9.52.158.156:0: Service
'sparkDriver' failed after 100 retries!
at org.jboss.netty.bootstrap.ServerBootstrap.bind
(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply
(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply
(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch
(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply

Re: Network-related environemental problem when running JDBCSuite

2015-10-15 Thread Richard Hillegas

Continuing this lively conversation with myself (hopefully this archived
thread may be useful to someone else in the future):

I set the following environment variable as recommended by this page:
http://stackoverflow.com/questions/29906686/failed-to-bind-to-spark-master-using-a-remote-cluster-with-two-workers

export SPARK_LOCAL_IP=127.0.0.1

Then I got errors related to booting the metastore_db. So I deleted that
directory. After that I was able to run spark-shell again.

Now let's see if this hack fixes the tests...


Thanks,
Rick Hillegas



Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 10:50:55 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 10/15/2015 10:51 AM
> Subject: Re: Network-related environemental problem when running
JDBCSuite

>
> For the record, I get the same error when I simply try to boot the
> spark shell:
>
> bash-3.2$ bin/spark-shell
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> for more info.
> Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-
> repl.properties
> To adjust logging level use sc.setLogLevel("INFO")
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.6.0-SNAPSHOT
>   /_/
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_60)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to b

Re: Network-related environemental problem when running JDBCSuite

2015-10-15 Thread Richard Hillegas
scala:1323)
at org.apache.spark.sql.hive.HiveContext.
(HiveContext.scala:100)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance
(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.spark.repl.SparkILoop.createSQLContext
(SparkILoop.scala:1028)
at $iwC$$iwC.(:9)
at $iwC.(:18)
at (:20)
at .(:24)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call
(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun
(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1
(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1
(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith
(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark
$1.apply(SparkILoopInit.scala:132)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark
$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring
(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark
(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark
(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl
$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp
(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks
(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization
(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization
(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl
$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl
$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl
$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader
(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$
$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
$SparkSubmit$$runMain(SparkSubmit.scala:680)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1
(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

:10: error: not found: value sqlContext
   import sqlContext.implicits._
  ^
:10: error: not found: value sqlContext
   import sqlContext.sql

Thanks,
Rick Hillegas



Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 09:47:22 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Dev <dev@spark.apache.org>
> Date: 10/15/2015 09:47 AM
> Subject: Network-related environemental problem when running JDBCSuite
>
> I am seeing what look like environmental errors when I try to run a
> test on a clean local branch which has been sync'd to the head of
> the development trunk. I would appreciate advice about how to debug
> or hack around this problem. For the record, the test ran cleanly
> last week. This is the experiment I am running:
>
> 

Re: unsubscribe

2015-09-30 Thread Richard Hillegas

Hi Sukesh,

To unsubscribe from the dev list, please send a message to
dev-unsubscr...@spark.apache.org. To unsubscribe from the user list, please
send a message user-unsubscr...@spark.apache.org. Please see:
http://spark.apache.org/community.html#mailing-lists.

Thanks,
-Rick

sukesh kumar  wrote on 09/28/2015 11:39:01 PM:

> From: sukesh kumar 
> To: "u...@spark.apache.org" ,
> "dev@spark.apache.org" 
> Date: 09/28/2015 11:39 PM
> Subject: unsubscribe
>
> unsubscribe
>
> --
> Thanks & Best Regards
> Sukesh Kumar

Re: [Discuss] NOTICE file for transitive "NOTICE"s

2015-09-28 Thread Richard Hillegas
Thanks, Sean!

Sean Owen <so...@cloudera.com> wrote on 09/25/2015 06:35:46 AM:

> From: Sean Owen <so...@cloudera.com>
> To: Reynold Xin <r...@databricks.com>, Richard Hillegas/San
> Francisco/IBM@IBMUS
> Cc: "dev@spark.apache.org" <dev@spark.apache.org>
> Date: 09/25/2015 07:21 PM
> Subject: Re: [Discuss] NOTICE file for transitive "NOTICE"s
>
> Work underway at ...
>
> https://issues.apache.org/jira/browse/SPARK-10833
> https://github.com/apache/spark/pull/8919
>
>
>
> On Fri, Sep 25, 2015 at 8:54 AM, Sean Owen <so...@cloudera.com> wrote:
> > Update: I *think* the conclusion was indeed that nothing needs to
> > happen with NOTICE.
> > However, along the way in
> > https://issues.apache.org/jira/browse/LEGAL-226 it emerged that the
> > BSD/MIT licenses should be inlined into LICENSE (or copied in the
> > distro somewhere). I can get on that -- just some grunt work to copy
> > and paste it all.
> >
> > On Thu, Sep 24, 2015 at 6:55 PM, Reynold Xin <r...@databricks.com>
wrote:
> >> Richard,
> >>
> >> Thanks for bringing this up and this is a great point. Let's start
another
> >> thread for it so we don't hijack the release thread.
> >>
> >>
> >>
> >> On Thu, Sep 24, 2015 at 10:51 AM, Sean Owen <so...@cloudera.com>
wrote:
> >>>
> >>> On Thu, Sep 24, 2015 at 6:45 PM, Richard Hillegas
<rhil...@us.ibm.com>
> >>> wrote:
> >>> > Under your guidance, I would be happy to help compile a NOTICE file
> >>> > which
> >>> > follows the pattern used by Derby and the JDK. This effort might
proceed
> >>> > in
> >>> > parallel with vetting 1.5.1 and could be targeted at a later
release
> >>> > vehicle. I don't think that the ASF's exposure is greatly increased
by
> >>> > one
> >>> > more release which follows the old pattern.
> >>>
> >>> I'd prefer to use the ASF's preferred pattern, no? That's what we've
> >>> been trying to do and seems like we're even required to do so, not
> >>> follow a different convention. There is some specific guidance there
> >>> about what to add, and not add, to these files. Specifically, because
> >>> the AL2 requires downstream projects to embed the contents of NOTICE,
> >>> the guidance is to only include elements in NOTICE that must appear
> >>> there.
> >>>
> >>> Put it this way -- what would you like to change specifically? (you
> >>> can start another thread for that)
> >>>
> >>> >> My assessment (just looked before I saw Sean's email) is the same
as
> >>> >> his. The NOTICE file embeds other projects' licenses.
> >>> >
> >>> > This may be where our perspectives diverge. I did not find those
> >>> > licenses
> >>> > embedded in the NOTICE file. As I see it, the licenses are cited
but not
> >>> > included.
> >>>
> >>> Pretty sure that was meant to say that NOTICE embeds other projects'
> >>> "notices", not licenses. And those notices can have all kinds of
> >>> stuff, including licenses.
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: dev-h...@spark.apache.org
> >>>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

Re: [Discuss] NOTICE file for transitive "NOTICE"s

2015-09-24 Thread Richard Hillegas

Thanks for forking the new email thread, Reynold. It is entirely possible
that I am being overly skittish. I have posed a question for our legal
experts: https://issues.apache.org/jira/browse/LEGAL-226

To answer Sean's question on the previous email thread, I would propose
making changes like the following to the NOTICE file:

Replace a stanza like this...

"This product contains a modified version of 'JZlib', a re-implementation
of
zlib in pure Java, which can be obtained at:

  * LICENSE:
* license/LICENSE.jzlib.txt (BSD Style License)
  * HOMEPAGE:
* http://www.jcraft.com/jzlib/;

...with full license text like this

"This product contains a modified version of 'JZlib', a re-implementation
of
zlib in pure Java, which can be obtained at:

  * HOMEPAGE:
* http://www.jcraft.com/jzlib/

The ZLIB license text follows:

JZlib 0.0.* were released under the GNU LGPL license.  Later, we have
switched
over to a BSD-style license.

--
Copyright (c) 2000-2011 ymnk, JCraft,Inc. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice,
 this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright
 notice, this list of conditions and the following disclaimer in
 the documentation and/or other materials provided with the
distribution.

  3. The names of the authors may not be used to endorse or promote
products
 derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL JCRAFT,
INC. OR ANY CONTRIBUTORS TO THIS SOFTWARE BE LIABLE FOR ANY DIRECT,
INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."

Thanks,
-Rick



Reynold Xin <r...@databricks.com> wrote on 09/24/2015 10:55:53 AM:

> From: Reynold Xin <r...@databricks.com>
> To: Sean Owen <so...@cloudera.com>
> Cc: Richard Hillegas/San Francisco/IBM@IBMUS, "dev@spark.apache.org"
> <dev@spark.apache.org>
> Date: 09/24/2015 10:56 AM
> Subject: [Discuss] NOTICE file for transitive "NOTICE"s
>
> Richard,
>
> Thanks for bringing this up and this is a great point. Let's start
> another thread for it so we don't hijack the release thread.
>
> On Thu, Sep 24, 2015 at 10:51 AM, Sean Owen <so...@cloudera.com> wrote:
> On Thu, Sep 24, 2015 at 6:45 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> > Under your guidance, I would be happy to help compile a NOTICE file
which
> > follows the pattern used by Derby and the JDK. This effort might
proceed in
> > parallel with vetting 1.5.1 and could be targeted at a later release
> > vehicle. I don't think that the ASF's exposure is greatly increased by
one
> > more release which follows the old pattern.
>
> I'd prefer to use the ASF's preferred pattern, no? That's what we've
> been trying to do and seems like we're even required to do so, not
> follow a different convention. There is some specific guidance there
> about what to add, and not add, to these files. Specifically, because
> the AL2 requires downstream projects to embed the contents of NOTICE,
> the guidance is to only include elements in NOTICE that must appear
> there.
>
> Put it this way -- what would you like to change specifically? (you
> can start another thread for that)
>
> >> My assessment (just looked before I saw Sean's email) is the same as
> >> his. The NOTICE file embeds other projects' licenses.
> >
> > This may be where our perspectives diverge. I did not find those
licenses
> > embedded in the NOTICE file. As I see it, the licenses are cited but
not
> > included.
>
> Pretty sure that was meant to say that NOTICE embeds other projects'
> "notices", not licenses. And those notices can have all kinds of
> stuff, including licenses.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org

Re: [Discuss] NOTICE file for transitive "NOTICE"s

2015-09-24 Thread Richard Hillegas

Thanks for that pointer, Sean. It may be that Derby is putting the license
information in the wrong place, viz. in the NOTICE file. But the 3rd party
license text may need to go somewhere else. See for instance the advice a
little further up the page at
http://www.apache.org/dev/licensing-howto.html#permissive-deps

Thanks,
-Rick

Sean Owen <so...@cloudera.com> wrote on 09/24/2015 12:07:01 PM:

> From: Sean Owen <so...@cloudera.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: "dev@spark.apache.org" <dev@spark.apache.org>
> Date: 09/24/2015 12:08 PM
> Subject: Re: [Discuss] NOTICE file for transitive "NOTICE"s
>
> Have a look at http://www.apache.org/dev/licensing-howto.html#mod-notice
> though, which makes a good point about limiting what goes into NOTICE
> to what is required. That's what makes me think we shouldn't do this.
>
> On Thu, Sep 24, 2015 at 7:24 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> > To answer Sean's question on the previous email thread, I would propose
> > making changes like the following to the NOTICE file:
>

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-24 Thread Richard Hillegas

Hi Sean and Wendell,

I share your concerns about how difficult and important it is to get this
right. I think that the Spark community has compiled a very readable and
well organized NOTICE file. A lot of careful thought went into gathering
together 3rd party projects which share the same license text.

All I can offer is my own experience of having served as a release manager
for a sister Apache project (Derby) over the past ten years. The Derby
NOTICE file recites 3rd party licenses verbatim. This is also the approach
taken by the THIRDPARTYLICENSEREADME.txt in the JDK. I am not a lawyer.
However, I have great respect for the experience and legal sensitivities of
the people who compile that JDK license file.

Under your guidance, I would be happy to help compile a NOTICE file which
follows the pattern used by Derby and the JDK. This effort might proceed in
parallel with vetting 1.5.1 and could be targeted at a later release
vehicle. I don't think that the ASF's exposure is greatly increased by one
more release which follows the old pattern.

Another comment inline...

Patrick Wendell <pwend...@gmail.com> wrote on 09/24/2015 10:24:25 AM:

> From: Patrick Wendell <pwend...@gmail.com>
> To: Sean Owen <so...@cloudera.com>
> Cc: Richard Hillegas/San Francisco/IBM@IBMUS, "dev@spark.apache.org"
> <dev@spark.apache.org>
> Date: 09/24/2015 10:24 AM
> Subject: Re: [VOTE] Release Apache Spark 1.5.1 (RC1)
>
> Hey Richard,
>
> My assessment (just looked before I saw Sean's email) is the same as
> his. The NOTICE file embeds other projects' licenses.

This may be where our perspectives diverge. I did not find those licenses
embedded in the NOTICE file. As I see it, the licenses are cited but not
included.

Thanks,
-Rick


> If those
> licenses themselves have pointers to other files or dependencies, we
> don't embed them. I think this is standard practice.
>
> - Patrick
>
> On Thu, Sep 24, 2015 at 10:00 AM, Sean Owen <so...@cloudera.com> wrote:
> > Hi Richard, those are messages reproduced from other projects' NOTICE
> > files, not created by Spark. They need to be reproduced in Spark's
> > NOTICE file to comply with the license, but their text may or may not
> > apply to Spark's distribution. The intent is that users would track
> > this back to the source project if interested to investigate what the
> > upstream notice is about.
> >
> > Requirements vary by license, but I do not believe there is additional
> > requirement to reproduce these other files. Their license information
> > is already indicated in accordance with the license terms.
> >
> > What licenses are you looking for in LICENSE that you believe
> should be there?
> >
> > Getting all this right is both difficult and important. I've made some
> > efforts over time to strictly comply with the Apache take on
> > licensing, which is at http://www.apache.org/legal/resolved.html  It's
> > entirely possible there's still a mistake somewhere in here (possibly
> > a new dependency, etc). Please point it out if you see such a thing.
> >
> > But so far what you describe is "working as intended", as far as I
> > know, according to Apache.
> >
> >
> > On Thu, Sep 24, 2015 at 5:52 PM, Richard Hillegas
> <rhil...@us.ibm.com> wrote:
> >> -1 (non-binding)
> >>
> >> I was able to build Spark cleanly from the source distribution using
the
> >> command in README.md:
> >>
> >> build/mvn -DskipTests clean package
> >>
> >> However, while I was waiting for the build to complete, I started
going
> >> through the NOTICE file. I was confused about where to find
> licenses for 3rd
> >> party software bundled with Spark. About halfway through the NOTICE
file,
> >> starting with Java Collections Framework, there is a list of
> licenses of the
> >> form
> >>
> >>license/*.txt
> >>
> >> But there is no license subdirectory in the source distro. I couldn't
find
> >> the  *.txt license files for Java Collections Framework, Base64
Encoder, or
> >> JZlib anywhere in the source distro. I couldn't find those files in
license
> >> subdirectories at the indicated home pages for those projects. (I did
find
> >> the license for JZLIB somewhere else, however:
> >> http://www.jcraft.com/jzlib/LICENSE.txt.)
> >>
> >> In addition, I couldn't find licenses for those projects in the master
> >> LICENSE file.
> >>
> >> Are users supposed to get licenses from the indicated 3rd party web
sites?
> >> Those online licenses could change. I would feel more comfortableif
the ASF
> >> w

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-24 Thread Richard Hillegas

-1 (non-binding)

I was able to build Spark cleanly from the source distribution using the
command in README.md:

build/mvn -DskipTests clean package

However, while I was waiting for the build to complete, I started going
through the NOTICE file. I was confused about where to find licenses for
3rd party software bundled with Spark. About halfway through the NOTICE
file, starting with Java Collections Framework, there is a list of licenses
of the form

   license/*.txt

But there is no license subdirectory in the source distro. I couldn't find
the  *.txt license files for Java Collections Framework, Base64 Encoder, or
JZlib anywhere in the source distro. I couldn't find those files in license
subdirectories at the indicated home pages for those projects. (I did find
the license for JZLIB somewhere else, however:
http://www.jcraft.com/jzlib/LICENSE.txt.)

In addition, I couldn't find licenses for those projects in the master
LICENSE file.

Are users supposed to get licenses from the indicated 3rd party web sites?
Those online licenses could change. I would feel more comfortable if the
ASF were protected by our bundling the licenses inside our source distros.

After looking for those three licenses, I stopped reading the NOTICE file.
Maybe I'm confused about how to read the NOTICE file. Where should users
expect to find the 3rd party licenses?

Thanks,
-Rick

Reynold Xin  wrote on 09/24/2015 12:27:25 AM:

> From: Reynold Xin 
> To: "dev@spark.apache.org" 
> Date: 09/24/2015 12:28 AM
> Subject: [VOTE] Release Apache Spark 1.5.1 (RC1)
>
> Please vote on releasing the following candidate as Apache Spark
> version 1.5.1. The vote is open until Sun, Sep 27, 2015 at 10:00 UTC
> and passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.5.1
> [ ] -1 Do not release this package because ...
>
> The release fixes 81 known issues in Spark 1.5.0, listed here:
> http://s.apache.org/spark-1.5.1
>
> The tag to be voted on is v1.5.1-rc1:
> https://github.com/apache/spark/commit/
> 4df97937dbf68a9868de58408b9be0bf87dbbb94
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release (1.5.1) can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1148/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-docs/
>
> ===
> How can I help test this release?
> ===
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate,
> then reporting any regressions.
>
> 
> What justifies a -1 vote for this release?
> 
> -1 vote should occur for regressions from Spark 1.5.0. Bugs already
> present in 1.5.0 will not block this release.
>
> ===
> What should happen to JIRA tickets still targeting 1.5.1?
> ===
> Please target 1.5.2 or 1.6.0.

column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas


I am puzzled by the behavior of column identifiers in Spark SQL. I don't
find any guidance in the "Spark SQL and DataFrame Guide" at
http://spark.apache.org/docs/latest/sql-programming-guide.html. I am seeing
odd behavior related to case-sensitivity and to delimited (quoted)
identifiers.

Consider the following declaration of a table in the Derby relational
database, whose dialect hews closely to the SQL Standard:

   create table app.t( a int, "b" int, "c""d" int );

Now let's load that table into Spark like this:

  import org.apache.spark.sql._
  import org.apache.spark.sql.types._

  val df = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1",
"dbtable" -> "app.t")).load()
  df.registerTempTable("test_data")

The following query runs fine because the column name matches the
normalized form in which it is stored in the metadata catalogs of the
relational database:

  // normalized column names are recognized
  sqlContext.sql(s"""select A from test_data""").show

But the following query fails during name resolution. This puzzles me
because non-delimited identifiers are case-insensitive in the ANSI/ISO
Standard. They are also supposed to be case-insensitive in HiveQL, at least
according to section 2.3.1 of the QuotedIdentifier.html webpage attached to
https://issues.apache.org/jira/browse/HIVE-6013:

  // ...unnormalized column names raise this error:
org.apache.spark.sql.AnalysisException: cannot resolve 'a' given input
columns A, b, c"d;
  sqlContext.sql("""select a from test_data""").show

Delimited (quoted) identifiers are treated as string literals. Again,
non-Standard behavior:

  // this returns rows consisting of the string literal "b"
  sqlContext.sql("""select "b" from test_data""").show

Embedded quotes in delimited identifiers won't even parse:

  // embedded quotes raise this error: java.lang.RuntimeException: [1.11]
failure: ``union'' expected but "d" found
  sqlContext.sql("""select "c""d" from test_data""").show

This behavior is non-Standard and it strikes me as hard to describe to
users concisely. Would the community support an effort to bring the
handling of column identifiers into closer conformance with the Standard?
Would backward compatibility concerns even allow us to do that?

Thanks,
-Rick

Derby version in Spark

2015-09-22 Thread Richard Hillegas


I see that lib_managed/jars holds these old Derby versions:

  lib_managed/jars/derby-10.10.1.1.jar
  lib_managed/jars/derby-10.10.2.0.jar

The Derby 10.10 release family supports some ancient JVMs: Java SE 5 and
Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone running
Spark on the resource-constrained Java ME platform. Is Spark really
deployed on Java SE 5? Is there some other reason that Spark uses the 10.10
Derby family?

If no-one needs those ancient JVMs, maybe we could consider changing the
Derby version to 10.11.1.1 or even to the upcoming 10.12.1.1 release (both
run on Java 6 and up).

Thanks,
-Rick

Re: column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas

Thanks for that tip, Michael. I think that my sqlContext was a raw
SQLContext originally. I have rebuilt Spark like so...

  sbt/sbt -Phive assembly/assembly

Now I see that my sqlContext is a HiveContext. That fixes one of the
queries. Now unnormalized column names work:

  // ...unnormalized column names work now
  sqlContext.sql("""select a from test_data""").show

However, quoted identifiers are still treated as string literals:

  // this still returns rows consisting of the string literal "b"
  sqlContext.sql("""select "b" from test_data""").show

And embedded quotes inside quoted identifiers are swallowed up:

  // this now returns rows consisting of the string literal "cd"
  sqlContext.sql("""select "c""d" from test_data""").show

Thanks,
-Rick

Michael Armbrust <mich...@databricks.com> wrote on 09/22/2015 10:58:36 AM:

> From: Michael Armbrust <mich...@databricks.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 09/22/2015 10:59 AM
> Subject: Re: column identifiers in Spark SQL
>
> Are you using a SQLContext or a HiveContext?  The programming guide
> suggests the latter, as the former is really only there because some
> applications may have conflicts with Hive dependencies.  SQLContext
> is case sensitive by default where as the HiveContext is not.  The
> parser in HiveContext is also a lot better.
>
> On Tue, Sep 22, 2015 at 10:53 AM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> I am puzzled by the behavior of column identifiers in Spark SQL. I
> don't find any guidance in the "Spark SQL and DataFrame Guide" at
> http://spark.apache.org/docs/latest/sql-programming-guide.html. I am
> seeing odd behavior related to case-sensitivity and to delimited
> (quoted) identifiers.
>
> Consider the following declaration of a table in the Derby
> relational database, whose dialect hews closely to the SQL Standard:
>
>    create table app.t( a int, "b" int, "c""d" int );
>
> Now let's load that table into Spark like this:
>
>   import org.apache.spark.sql._
>   import org.apache.spark.sql.types._
>
>   val df = sqlContext.read.format("jdbc").options(
>     Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1",
>     "dbtable" -> "app.t")).load()
>   df.registerTempTable("test_data")
>
> The following query runs fine because the column name matches the
> normalized form in which it is stored in the metadata catalogs of
> the relational database:
>
>   // normalized column names are recognized
>   sqlContext.sql(s"""select A from test_data""").show
>
> But the following query fails during name resolution. This puzzles
> me because non-delimited identifiers are case-insensitive in the
> ANSI/ISO Standard. They are also supposed to be case-insensitive in
> HiveQL, at least according to section 2.3.1 of the
> QuotedIdentifier.html webpage attached to https://issues.apache.org/
> jira/browse/HIVE-6013:
>
>   // ...unnormalized column names raise this error:
> org.apache.spark.sql.AnalysisException: cannot resolve 'a' given
> input columns A, b, c"d;
>   sqlContext.sql("""select a from test_data""").show
>
> Delimited (quoted) identifiers are treated as string literals.
> Again, non-Standard behavior:
>
>   // this returns rows consisting of the string literal "b"
>   sqlContext.sql("""select "b" from test_data""").show
>
> Embedded quotes in delimited identifiers won't even parse:
>
>   // embedded quotes raise this error: java.lang.RuntimeException:
> [1.11] failure: ``union'' expected but "d" found
>   sqlContext.sql("""select "c""d" from test_data""").show
>
> This behavior is non-Standard and it strikes me as hard to describe
> to users concisely. Would the community support an effort to bring
> the handling of column identifiers into closer conformance with the
> Standard? Would backward compatibility concerns even allow us to do that?
>
> Thanks,
> -Rick

Re: Derby version in Spark

2015-09-22 Thread Richard Hillegas

Thanks, Ted. I'll follow up with the Hive folks.

Cheers,
-Rick

Ted Yu <yuzhih...@gmail.com> wrote on 09/22/2015 03:41:12 PM:

> From: Ted Yu <yuzhih...@gmail.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 09/22/2015 03:41 PM
> Subject: Re: Derby version in Spark
>
> I cloned Hive 1.2 code base and saw:
>
>     10.10.2.0
>
> So the version used by Spark is quite close to what Hive uses.
>
> On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> I see.
> I use maven to build so I observe different contents under
> lib_managed directory.
>
> Here is snippet of dependency tree:
>
> [INFO] |  +-
org.spark-project.hive:hive-metastore:jar:1.2.1.spark:compile
> [INFO] |  |  +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
> [INFO] |  |  +- org.apache.derby:derby:jar:10.10.1.1:compile
>
> On Tue, Sep 22, 2015 at 3:21 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> Thanks, Ted. I'm working on my master branch. The lib_managed/jars
> directory has a lot of jarballs, including hadoop and hive. Maybe
> these were faulted in when I built with the following command?
>
>   sbt/sbt -Phive assembly/assembly
>
> The Derby jars seem to be used in order to manage the metastore_db
> database. Maybe my question should be directed to the Hive community?
>
> Thanks,
> -Rick
>
> Here are the gory details:
>
> bash-3.2$ ls lib_managed/jars
> FastInfoset-1.2.12.jar curator-test-2.4.0.jar jersey-test-framework-
> grizzly2-1.9.jar parquet-format-2.3.0-incubating.jar
> JavaEWAH-0.3.2.jar datanucleus-api-jdo-3.2.6.jar jets3t-0.7.1.jar
> parquet-generator-1.7.0.jar
> ST4-4.0.4.jar datanucleus-core-3.2.10.jar jetty-continuation-8.1.
> 14.v20131031.jar parquet-hadoop-1.7.0.jar
> activation-1.1.jar datanucleus-rdbms-3.2.9.jar jetty-http-8.1.
> 14.v20131031.jar parquet-hadoop-bundle-1.6.0.jar
> akka-actor_2.10-2.3.11.jar derby-10.10.1.1.jar jetty-io-8.1.
> 14.v20131031.jar parquet-jackson-1.7.0.jar
> akka-remote_2.10-2.3.11.jar derby-10.10.2.0.jar jetty-jndi-8.1.
> 14.v20131031.jar platform-3.4.0.jar
> akka-slf4j_2.10-2.3.11.jar genjavadoc-plugin_2.10.4-0.9-spark0.jar
> jetty-plus-8.1.14.v20131031.jar pmml-agent-1.1.15.jar
> akka-testkit_2.10-2.3.11.jar groovy-all-2.1.6.jar jetty-security-8.
> 1.14.v20131031.jar pmml-model-1.1.15.jar
> antlr-2.7.7.jar guava-11.0.2.jar jetty-server-8.1.14.v20131031.jar
> pmml-schema-1.1.15.jar
> antlr-runtime-3.4.jar guice-3.0.jar jetty-servlet-8.1.
> 14.v20131031.jar postgresql-9.3-1102-jdbc41.jar
> aopalliance-1.0.jar h2-1.4.183.jar jetty-util-6.1.26.jar py4j-0.8.2.1.jar
> arpack_combined_all-0.1-javadoc.jar hadoop-annotations-2.2.0.jar
> jetty-util-8.1.14.v20131031.jar pyrolite-4.4.jar
> arpack_combined_all-0.1.jar hadoop-auth-2.2.0.jar jetty-webapp-8.1.
> 14.v20131031.jar quasiquotes_2.10-2.0.0.jar
> asm-3.2.jar hadoop-client-2.2.0.jar jetty-websocket-8.1.
> 14.v20131031.jar reflectasm-1.07-shaded.jar
> avro-1.7.4.jar hadoop-common-2.2.0.jar jetty-xml-8.1.
> 14.v20131031.jar sac-1.3.jar
> avro-1.7.7.jar hadoop-hdfs-2.2.0.jar jline-0.9.94.jar scala-
> compiler-2.10.0.jar
> avro-ipc-1.7.7-tests.jar hadoop-mapreduce-client-app-2.2.0.jar
> jline-2.10.4.jar scala-compiler-2.10.4.jar
> avro-ipc-1.7.7.jar hadoop-mapreduce-client-common-2.2.0.jar jline-2.
> 12.jar scala-library-2.10.4.jar
> avro-mapred-1.7.7-hadoop2.jar hadoop-mapreduce-client-core-2.2.0.jar
> jna-3.4.0.jar scala-reflect-2.10.4.jar
> breeze-macros_2.10-0.11.2.jar hadoop-mapreduce-client-jobclient-2.2.
> 0.jar joda-time-2.5.jar scalacheck_2.10-1.11.3.jar
> breeze_2.10-0.11.2.jar hadoop-mapreduce-client-shuffle-2.2.0.jar
> jodd-core-3.5.2.jar scalap-2.10.0.jar
> calcite-avatica-1.2.0-incubating.jar hadoop-yarn-api-2.2.0.jar
> json-20080701.jar selenium-api-2.42.2.jar
> calcite-core-1.2.0-incubating.jar hadoop-yarn-client-2.2.0.jar
> json-20090211.jar selenium-chrome-driver-2.42.2.jar
> calcite-linq4j-1.2.0-incubating.jar hadoop-yarn-common-2.2.0.jar
> json4s-ast_2.10-3.2.10.jar selenium-firefox-driver-2.42.2.jar
> cglib-2.2.1-v20090111.jar hadoop-yarn-server-common-2.2.0.jar
> json4s-core_2.10-3.2.10.jar selenium-htmlunit-driver-2.42.2.jar
> cglib-nodep-2.1_3.jar hadoop-yarn-server-nodemanager-2.2.0.jar
> json4s-jackson_2.10-3.2.10.jar selenium-ie-driver-2.42.2.jar
> chill-java-0.5.0.jar hamcrest-core-1.1.jar jsr173_api-1.0.jar
> selenium-java-2.42.2.jar
> chill_2.10-0.5.0.jar hamcrest-core-1.3.jar jsr305-1.3.9.jar
> selenium-remote-driver-2.42.2.jar
> commons-beanutils-1.7.0.jar hamcrest-library-1.3.jar jsr305-2.0.
> 1.jar selenium-safari-driver-2.42.2.jar
> commons-beanutils-core-1.8.0.jar hive-exec-1.2.1.spark.jar jta-1.
> 1.jar selenium-support-2.42.2.jar
&

Re: column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas

Thanks for that additional tip, Michael. Backticks fix the problem query in
which an identifier was transformed into a string literal. So this works
now...

  // now correctly resolves the unnormalized column id
  sqlContext.sql("""select `b` from test_data""").show

Any suggestion about how to escape an embedded double quote?

  // java.sql.SQLSyntaxErrorException: Syntax error: Encountered "\"" at
line 1, column 12.
  sqlContext.sql("""select `c"d` from test_data""").show

  // org.apache.spark.sql.AnalysisException: cannot resolve 'c\"d' given
input columns A, b, c"d; line 1 pos 7
  sqlContext.sql("""select `c\"d` from test_data""").show

Thanks,
-Rick

Michael Armbrust <mich...@databricks.com> wrote on 09/22/2015 01:16:12 PM:

> From: Michael Armbrust <mich...@databricks.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 09/22/2015 01:16 PM
> Subject: Re: column identifiers in Spark SQL
>
> HiveQL uses `backticks` for quoted identifiers.
>
> On Tue, Sep 22, 2015 at 1:06 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> Thanks for that tip, Michael. I think that my sqlContext was a raw
> SQLContext originally. I have rebuilt Spark like so...
>
>   sbt/sbt -Phive assembly/assembly
>
> Now I see that my sqlContext is a HiveContext. That fixes one of the
> queries. Now unnormalized column names work:
>
>   // ...unnormalized column names work now
>   sqlContext.sql("""select a from test_data""").show
>
> However, quoted identifiers are still treated as string literals:
>
>   // this still returns rows consisting of the string literal "b"
>   sqlContext.sql("""select "b" from test_data""").show
>
> And embedded quotes inside quoted identifiers are swallowed up:
>
>   // this now returns rows consisting of the string literal "cd"
>   sqlContext.sql("""select "c""d" from test_data""").show
>
> Thanks,
> -Rick
>
> Michael Armbrust <mich...@databricks.com> wrote on 09/22/2015 10:58:36
AM:
>
> > From: Michael Armbrust <mich...@databricks.com>
> > To: Richard Hillegas/San Francisco/IBM@IBMUS
> > Cc: Dev <dev@spark.apache.org>
> > Date: 09/22/2015 10:59 AM
> > Subject: Re: column identifiers in Spark SQL
>
> >
> > Are you using a SQLContext or a HiveContext?  The programming guide
> > suggests the latter, as the former is really only there because some
> > applications may have conflicts with Hive dependencies.  SQLContext
> > is case sensitive by default where as the HiveContext is not.  The
> > parser in HiveContext is also a lot better.
> >
> > On Tue, Sep 22, 2015 at 10:53 AM, Richard Hillegas <rhil...@us.ibm.com
> > wrote:
> > I am puzzled by the behavior of column identifiers in Spark SQL. I
> > don't find any guidance in the "Spark SQL and DataFrame Guide" at
> > http://spark.apache.org/docs/latest/sql-programming-guide.html. I am
> > seeing odd behavior related to case-sensitivity and to delimited
> > (quoted) identifiers.
> >
> > Consider the following declaration of a table in the Derby
> > relational database, whose dialect hews closely to the SQL Standard:
> >
> >    create table app.t( a int, "b" int, "c""d" int );
> >
> > Now let's load that table into Spark like this:
> >
> >   import org.apache.spark.sql._
> >   import org.apache.spark.sql.types._
> >
> >   val df = sqlContext.read.format("jdbc").options(
> >     Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1",
> >     "dbtable" -> "app.t")).load()
> >   df.registerTempTable("test_data")
> >
> > The following query runs fine because the column name matches the
> > normalized form in which it is stored in the metadata catalogs of
> > the relational database:
> >
> >   // normalized column names are recognized
> >   sqlContext.sql(s"""select A from test_data""").show
> >
> > But the following query fails during name resolution. This puzzles
> > me because non-delimited identifiers are case-insensitive in the
> > ANSI/ISO Standard. They are also supposed to be case-insensitive in
> > HiveQL, at least according to section 2.3.1 of the
> > QuotedIdentifier.html webpage attached to https://issues.apache.org/
> > jira/browse/HIVE-6013:
> >
> >   // 

Re: Derby version in Spark

2015-09-22 Thread Richard Hillegas
.jar  htmlunit-2.14.jar
jul-to-slf4j-1.7.10.jar slf4j-api-1.7.10.jar
commons-codec-1.4.jar   htmlunit-core-js-2.14.jar
junit-4.10.jar  slf4j-log4j12-1.7.10.jar
commons-codec-1.5.jar   httpclient-4.3.2.jar
junit-dep-4.10.jar  snappy-0.2.jar
commons-codec-1.9.jar   httpcore-4.3.1.jar
junit-dep-4.8.2.jar 
spire-macros_2.10-0.7.4.jar
commons-collections-3.2.1.jar   httpmime-4.3.2.jar
junit-interface-0.10.jarspire_2.10-0.7.4.jar
commons-compiler-2.7.8.jar  istack-commons-runtime-2.16.jar
junit-interface-0.9.jar 
stax-api-1.0.1.jar
commons-compress-1.4.1.jar  ivy-2.4.0.jar
libfb303-0.9.2.jar  stream-2.7.0.jar
commons-configuration-1.6.jar   jackson-core-asl-1.8.8.jar
libthrift-0.9.2.jar stringtemplate-3.2.1.jar
commons-dbcp-1.4.jarjackson-core-asl-1.9.13.jar
lz4-1.3.0.jar   tachyon-client-0.7.1.jar
commons-digester-1.8.jarjackson-jaxrs-1.8.8.jar
mesos-0.21.1-shaded-protobuf.jar
tachyon-underfs-hdfs-0.7.1.jar
commons-exec-1.1.jarjackson-mapper-asl-1.9.13.jar
minlog-1.2.jar
tachyon-underfs-local-0.7.1.jar
commons-httpclient-3.1.jar  jackson-xc-1.8.8.jar
mockito-core-1.9.5.jar  test-interface-0.5.jar
commons-io-2.1.jar  janino-2.7.8.jar
mysql-connector-java-5.1.34.jar test-interface-1.0.jar
commons-io-2.4.jar  jansi-1.4.jar
nekohtml-1.9.20.jar 
uncommons-maths-1.2.2a.jar
commons-lang-2.5.jarjavassist-3.15.0-GA.jar
netty-all-4.0.29.Final.jar  unused-1.0.0.jar
commons-lang-2.6.jarjavax.inject-1.jar
objenesis-1.0.jar   webbit-0.4.14.jar
commons-lang3-3.3.2.jar jaxb-api-2.2.2.jar
objenesis-1.2.jar   xalan-2.7.1.jar
commons-logging-1.1.3.jar   jaxb-api-2.2.7.jar
opencsv-2.3.jar xercesImpl-2.11.0.jar
commons-math-2.1.jarjaxb-core-2.2.7.jar
oro-2.0.8.jar   xml-apis-1.4.01.jar
commons-math-2.2.jarjaxb-impl-2.2.3-1.jar
paranamer-2.3.jar   xmlenc-0.52.jar
commons-math3-3.4.1.jar jaxb-impl-2.2.7.jar
paranamer-2.6.jar   xz-1.0.jar
commons-net-3.1.jar jblas-1.2.4.jar
parquet-avro-1.7.0.jar  zookeeper-3.4.5.jar
commons-pool-1.5.4.jar  jcl-over-slf4j-1.7.10.jar
parquet-column-1.7.0.jar
core-1.1.2.jar  jdo-api-3.0.1.jar
parquet-common-1.7.0.jar
cssparser-0.9.13.jarjersey-guice-1.9.jar
parquet-encoding-1.7.0.jar

Ted Yu <yuzhih...@gmail.com> wrote on 09/22/2015 01:32:39 PM:

> From: Ted Yu <yuzhih...@gmail.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 09/22/2015 01:33 PM
> Subject: Re: Derby version in Spark
>
> Which Spark release are you building ?
>
> For master branch, I get the following:
>
> lib_managed/jars/datanucleus-api-jdo-3.2.6.jar  lib_managed/jars/
> datanucleus-core-3.2.10.jar  lib_managed/jars/datanucleus-rdbms-3.2.9.jar
>
> FYI
>
> On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> I see that lib_managed/jars holds these old Derby versions:
>
>   lib_managed/jars/derby-10.10.1.1.jar
>   lib_managed/jars/derby-10.10.2.0.jar
>
> The Derby 10.10 release family supports some ancient JVMs: Java SE 5
> and Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone
> running Spark on the resource-constrained Java ME platform. Is Spark
> really deployed on Java SE 5? Is there some other reason that Spark
> uses the 10.10 Derby family?
>
> If no-one needs those ancient JVMs, maybe we could consider changing
> the Derby version to 10.11.1.1 or even to the upcoming 10.12.1.1
> release (both run on Java 6 and up).
>
> Thanks,
> -Rick

Re: Unsubscribe

2015-09-21 Thread Richard Hillegas

To unsubscribe from the dev list, please send a message to
dev-unsubscr...@spark.apache.org as described here:
http://spark.apache.org/community.html#mailing-lists.

Thanks,
-Rick

Dulaj Viduranga  wrote on 09/21/2015 10:15:58 AM:

> From: Dulaj Viduranga 
> To: dev@spark.apache.org
> Date: 09/21/2015 10:16 AM
> Subject: Unsubscribe
>
> Unsubscribe
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>