[ANNOUNCE] Apache Derby 10.17.1.0 released

2023-11-16 Thread Richard Hillegas

The Apache Derby project is pleased to announce feature release 10.17.1.0.

Apache Derby is a sub-project of the Apache DB project. Derby is a pure 
Java relational database engine which conforms to the ISO/ANSI SQL and 
JDBC standards. Derby aims to be easy for developers and end-users to 
work with.


The chief feature of this release is the removal of calls to deprecated 
Java APIs. Derby 10.17.1.0 has been built and tested on the Java SE 21 
platform, and will run only on Java SE 21 and newer Java platforms. 
Derby 10.17.1.0 cannot be used with older Java platforms. Please see 
http://db.apache.org/derby/derby_downloads.html for more details 
regarding which Derby releases are compatible with which Java platforms.


In addition, Derby 10.17.1.0 fixes a flaw in Derby's LDAP authentication 
logic (CVE-2022-46337).


Derby 10.17.1.0 contains other bug and documentation fixes. The release 
can be obtained from the Derby download site:


http://db.apache.org/derby/derby_downloads.html.

Please try out this new release.



new hurdle for applications which programatically install a SecurityManager

2021-11-18 Thread Richard Hillegas
Build 18-ea+23-1525 has introduced another hurdle for applications which 
use the SecurityManager. In order to install a SecurityManager, you now 
have to set -Djava.security.manager=allow on the boot command line. This 
property cannot be set programatically, unlike the other system 
properties related to the SecurityManager. I have attached a simple 
repro of this asymmetry (DERBY_7126_B) to 
https://issues.apache.org/jira/browse/DERBY-7126. The repro 
programatically sets java.security.manager. Here's the code:


import java.io.PrintWriter;
import java.util.Properties;

/**
 * Demonstrate that the SecurityManager can be installed by setting
 */
@SuppressWarnings("removal")
public class DERBY_7126_B
{
    private static final String PROPERTY_FILE_NAME = 
"/tmp/derby-7126_B.properties";
    private static final String SECURITY_POLICY_FILE_NAME = 
"/tmp/derby-7126_B.policy";
    private static final String SECURITY_POLICY_FILE_URL = "file:" + 
SECURITY_POLICY_FILE_NAME;


    private final static String POLICY_FILE_PROPERTY = 
"java.security.policy";


    private static final String SECURITY_FILE_CONTENTS =
    "grant\n" +
    "{\n" +
    "  permission java.io.FilePermission \"/tmp/-\", 
\"read,write,delete\";\n" +

    "};\n"
    ;

    public static void main(String... args) throws Exception
    {
    // write the policy file
    try (PrintWriter pw = new PrintWriter(SECURITY_POLICY_FILE_NAME))
    { pw.write(SECURITY_FILE_CONTENTS); }

    // start up a security manager using the policy file we just wrote
    System.setProperty( POLICY_FILE_PROPERTY, 
SECURITY_POLICY_FILE_URL );

    System.setProperty( "java.security.manager", "allow" );

    System.setSecurityManager( new SecurityManager() );
    }

}

Here's the output I get when I run that program against 18-ea+23-1525 
WITHOUT setting java.security.manager on the boot command line:


Exception in thread "main" java.lang.UnsupportedOperationException: The 
Security Manager is deprecated and will be removed in a future release

    at java.base/java.lang.System.setSecurityManager(System.java:411)
    at DERBY_7126_B.main(DERBY_7126_B.java:34)

Here's the output I get when I run that program against 18-ea+23-1525 
but do set java.security.manager on the boot command line:


WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by DERBY_7126_B 
(file:/Users/rhillegas/src/)

WARNING: Please consider reporting this to the maintainers of DERBY_7126_B
WARNING: System::setSecurityManager will be removed in a future release

Is this asymmetry in the handling of this new system property 
deliberate? If so, what is the motivation for this asymmetry? If not, 
can the new property be made to operate like the other SecurityManager 
properties, that is, can the JDK be amended so that 
java.security.manager can be set programatically?


Thanks,
-Rick



Re: Accessing transitional tables from trigger procedures

2021-08-13 Thread Richard Hillegas
Trigger transition tables can only be used for row-scoped triggers (FOR
EACH ROW triggers). They cannot be used for statement-scoped triggers (FOR
EACH STATEMENT triggers). I suppose that you could create a row-scoped
trigger which populates a scratch table from the values of the transition
table. Then you could create a statement-scoped trigger which fires a
database procedure to process the scratch table.

On Fri, Aug 13, 2021 at 11:02 AM Mark Raynsford <
list+org.apache.db.derby-u...@io7m.com> wrote:

> Hello!
>
> Is it supposed to be possible to access the transitional tables
> produced in triggers? For example, I need to write the following:
>
> --
> create trigger cardant.item_locations_enforce_counts_update_trigger
>   after update on cardant.item_locations
> referencing old_table as new_item_locations
>   for each statement
>  call cardant.item_locations_enforce_counts ()
> --
>
> .. Where cardant.item_locations_enforce_counts is a JDBC/Java method
> that tries to check the rows of new_item_locations. Unfortunately,
> doing so just results in:
>
> java.sql.SQLSyntaxErrorException: Table/View 'NEW_ITEM_LOCATIONS' does
> not exist.
>
> --
> Mark Raynsford | https://www.io7m.com
>
>


Re: -jar option and the modulepath

2018-11-23 Thread Richard Hillegas

Thanks, Alan.

On 11/23/18 12:06 AM, Alan Bateman wrote:

On 22/11/2018 19:27, Richard Hillegas wrote:
Can I scribble something in a jar file manifest which will cause 
"java -jar" to boot with a modulepath rather than a classpath? I do 
not see any support for a modulepath attribute in the Java 9 jar file 
documentation at 
https://docs.oracle.com/javase/9/docs/specs/jar/jar.html. My sense is 
that the -jar option commits the JVM to using a classpath.
There is no support for executable modular JARs at this time. It's 
part of a bigger topic that is tracked as #MultiModuleExecutableJARs. 
Some prototypes during JDK 9 but the decision at the time was to defer 
it to some future effort. So when you run with `java -jar` then it 
puts the JAR file on the class path (exactly as it did in all previous 
releases).


-Alan





-jar option and the modulepath

2018-11-22 Thread Richard Hillegas
Can I scribble something in a jar file manifest which will cause "java 
-jar" to boot with a modulepath rather than a classpath? I do not see 
any support for a modulepath attribute in the Java 9 jar file 
documentation at 
https://docs.oracle.com/javase/9/docs/specs/jar/jar.html. My sense is 
that the -jar option commits the JVM to using a classpath.


Thanks,
-Rick



Re: speed of class loading via a modulepath

2018-11-19 Thread Richard Hillegas

Thanks, Rémi and Alan.

On 11/19/18 12:53 AM, Alan Bateman wrote:

On 18/11/2018 20:00, Richard Hillegas wrote:
I am updating Apache Derby documentation to reflect the recent 
modularization of the codeline. While doing this, I have stumbled 
across an old piece of advice from the Derby Tuning Guide:


"The structure of your classpath can affect Derby startup time and 
the time required to load a particular class.


The classpath is searched linearly, so locate Derby's libraries at 
the beginning of the classpath so that they are found first. If the 
classpath first points to a directory that contains multiple files, 
booting Derby can be very slow."


That may be an old, Java 1.2 concern, which no longer affects modern 
JVMs. I have a couple questions:


1) Is this still good advice when booting a large application like 
Derby via a classpath?


2) What about the modulepath? Can classes be faulted in faster by 
re-arranging the order of jar files on the modulepath?
The position of the directory or module on the module path won't 
impact class/resources loading as modules are accessed directly (as 
Remi notes) so no linear scan/searching after startup. Ordering is of 
course important when you end up with a multiple versions of the same 
module on the path, in that case the first version of a module wins.


One other thing to be aware of is that the initial scanning of the 
module path can be slow when it contains lots of automatic modules or 
modules that have been packaged with the jar tool from JDK 8 or older. 
Explicit modules that are packaged with the JDK 9 (or newer) jar tool 
are indexed at packaging time to avoid scanning the contents at startup.


-Alan





speed of class loading via a modulepath

2018-11-18 Thread Richard Hillegas
I am updating Apache Derby documentation to reflect the recent 
modularization of the codeline. While doing this, I have stumbled across 
an old piece of advice from the Derby Tuning Guide:


"The structure of your classpath can affect Derby startup time and the 
time required to load a particular class.


The classpath is searched linearly, so locate Derby's libraries at the 
beginning of the classpath so that they are found first. If the 
classpath first points to a directory that contains multiple files, 
booting Derby can be very slow."


That may be an old, Java 1.2 concern, which no longer affects modern 
JVMs. I have a couple questions:


1) Is this still good advice when booting a large application like Derby 
via a classpath?


2) What about the modulepath? Can classes be faulted in faster by 
re-arranging the order of jar files on the modulepath?


Thanks,
-Rick



Re: generated code and jigsaw modules

2018-10-10 Thread Richard Hillegas

Thanks, Alan. This is very helpful.

On 10/10/18 9:53 AM, Alan Bateman wrote:

On 10/10/2018 16:37, Richard Hillegas wrote:

:

java.lang.invoke.MethodHandles.lookup().defineClass(generatedClassBytes)

This approach does indeed put the generated class where I want it: 
inside the Derby engine module. Unfortunately, the ClassLoader of the 
generated class is the application class loader. I can't figure out 
how to force the generated class to use the custom ClassLoader instead.
MethodHandles.lookup() creates a Lookup to the caller of the method so 
I assume you must be calling it from Derby code on the class path. If 
you want the class generated in the same runtime package as the code 
loaded from the database then you'll need to get a Lookup object to a 
class in that runtime package, perhaps with privateLookupIn.




Alan's approach is a bit more complicated. It involves following the 
pattern in 
com.sun.org.apache.xalan.internal.xsltc.trax.TemplatesImpl. It 
involves generating a temporary module for each generated class and 
then adding more export directives to the engine module so that the 
generated module can call back into the engine. I have to say I'm a 
little confused about the implications of slow memory leaks with this 
approach. I don't know what happens to these generated modules and 
export directives when the generated class is garbage-collected.


More immediately, however, I am up against the same problem which 
plagues Rémi's approach: how do I get the generated module to resolve 
classes in the custom ClassLoader? More specifically, I am stuck 
trying to get the generated module to require the user-written 
modules, that is, the user-written jar files. What I am missing is 
the ability to retrieve the module names of these jar files so that I 
can craft requires directives. The only way I know to get a module 
name is to use ModuleFinder.of(Path...). Unfortunately, the Path 
interface is an abstraction for file systems and is not a good fit 
for locating a blob of bytes stored inside a database.
My mail wasn't suggesting an approach, I was just pointing out an 
example of code in the JDK that creates a dynamic module to 
encapsulate generated code. It just happens that there is one class in 
that module.


As regards classes in the database then it would require developing 
your own ModuleFinder that can find modules in the database. There was 
an example on jigsaw-dev recently where someone was looking for a 
ModuleFinder to find modules in WAR files [1] which might be useful to 
get an idea on what is involved.


-Alan

[1] 
http://mail.openjdk.java.net/pipermail/jigsaw-dev/2018-September/013924.html










Re: generated code and jigsaw modules

2018-10-10 Thread Richard Hillegas
Thanks again to Rémi and Alan for their advice. Unfortunately, I have 
not been able to make either approach work, given another complexity of 
Derby's class loading. Let me explain that additional issue.


Derby lets users load jar files into the database. There they live as 
named blobs of bytes. The jar files contain user-defined data types, 
functions, procedures, and aggregators, which are coded in Java and can 
be used in SQL statements. Derby lets users wire these jar files into a 
custom classpath which drives a custom ClassLoader at query-execution 
time. I have not been able to make this custom ClassLoader work with 
either Rémi or Alan's approach. Note that a Derby engine manages many 
databases and each database can have its own custom ClassLoader.


I like the simplicity of Rémi's approach:

java.lang.invoke.MethodHandles.lookup().defineClass(generatedClassBytes)

This approach does indeed put the generated class where I want it: 
inside the Derby engine module. Unfortunately, the ClassLoader of the 
generated class is the application class loader. I can't figure out how 
to force the generated class to use the custom ClassLoader instead. As a 
consequence,  the generated class cannot resolve user-defined functions 
which live inside jar files in the database. Poking the customer 
ClassLoader into the thread's context class loader before calling 
MethodHandles.lookup() doesn't work.


Alan's approach is a bit more complicated. It involves following the 
pattern in com.sun.org.apache.xalan.internal.xsltc.trax.TemplatesImpl. 
It involves generating a temporary module for each generated class and 
then adding more export directives to the engine module so that the 
generated module can call back into the engine. I have to say I'm a 
little confused about the implications of slow memory leaks with this 
approach. I don't know what happens to these generated modules and 
export directives when the generated class is garbage-collected.


More immediately, however, I am up against the same problem which 
plagues Rémi's approach: how do I get the generated module to resolve 
classes in the custom ClassLoader? More specifically, I am stuck trying 
to get the generated module to require the user-written modules, that 
is, the user-written jar files. What I am missing is the ability to 
retrieve the module names of these jar files so that I can craft 
requires directives. The only way I know to get a module name is to use 
ModuleFinder.of(Path...). Unfortunately, the Path interface is an 
abstraction for file systems and is not a good fit for locating a blob 
of bytes stored inside a database.


I would appreciate any further advice about how to get over these speed 
bumps.


Thanks,
-Rick


On 10/4/18 9:10 AM, Richard Hillegas wrote:
I am looking for advice about how to tighten up module encapsulation 
while generating byte code on the fly. I ask this question on behalf 
of Apache Derby, a pure-Java relational database whose original code 
dates back to Java 1.2. I want to reduce Derby's attack-surface when 
running with a module path.


First a little context: A relational database is an interpreter for 
the SQL language. It converts SQL queries into byte code which then 
runs on a virtual machine embedded in the interpreter. In Derby's 
case, the virtual machine is the Java VM and the byte code is simply 
Java byte code. That is, a Derby query plan is a class whose byte code 
is generated on the fly at run time.


I have converted the Apache Derby codeline into a set of jigsaw 
modules: https://issues.apache.org/jira/browse/DERBY-6945. 
Unfortunately, I had to punch holes in the encapsulation of the main 
Derby module so that the generated query plans could call back into 
the Derby engine. That is because, by default, generated query plans 
load into the catch-all, unnamed module. Note that all of these 
generated classes live in a single package which does not belong to 
any named module.


1) Is it possible to load generated code into a named module?

2) Alternatively, can someone recommend another approach for 
preserving module encapsulation while generating classes on the fly?


I would appreciate any advice or examples which you can recommend.

Thanks,
-Rick






Re: generated code and jigsaw modules

2018-10-04 Thread Richard Hillegas

On 10/4/18 9:45 AM, Alan Bateman wrote:

On 04/10/2018 17:10, Richard Hillegas wrote:
I am looking for advice about how to tighten up module encapsulation 
while generating byte code on the fly. I ask this question on behalf 
of Apache Derby, a pure-Java relational database whose original code 
dates back to Java 1.2. I want to reduce Derby's attack-surface when 
running with a module path.


First a little context: A relational database is an interpreter for 
the SQL language. It converts SQL queries into byte code which then 
runs on a virtual machine embedded in the interpreter. In Derby's 
case, the virtual machine is the Java VM and the byte code is simply 
Java byte code. That is, a Derby query plan is a class whose byte 
code is generated on the fly at run time.


I have converted the Apache Derby codeline into a set of jigsaw 
modules: https://issues.apache.org/jira/browse/DERBY-6945. 
Unfortunately, I had to punch holes in the encapsulation of the main 
Derby module so that the generated query plans could call back into 
the Derby engine. That is because, by default, generated query plans 
load into the catch-all, unnamed module. Note that all of these 
generated classes live in a single package which does not belong to 
any named module.


1) Is it possible to load generated code into a named module?

2) Alternatively, can someone recommend another approach for 
preserving module encapsulation while generating classes on the fly?


I would appreciate any advice or examples which you can recommend.


There are a couple of places in the JDK where we spin bytecode into 
modules that are created at run-time. One example is in the Nashorn 
and was presented by MIchael Haupt at JVMLS 2017 [1]. There's a lot in 
that so a simpler example to look at is in the XML transformation code 
[2] where there is a module created at run-time for each translet. The 
module is fully encapsulate except for an entry point that it exports 
to the java.xml module in the parent module layer. In turn, the 
java.xml exports one of its internal packages to the translet module 
to allow what may be equivalent to your generated code calling back 
into the Derby engine.


-Alan

Thanks, Alan. I will study this example. Cheers!


[1] https://www.youtube.com/watch?v=Zk6a6jNZAt0
[2] 
http://hg.openjdk.java.net/jdk/jdk/raw-file/tip/src/java.xml/share/classes/com/sun/org/apache/xalan/internal/xsltc/trax/TemplatesImpl.java






Re: generated code and jigsaw modules

2018-10-04 Thread Richard Hillegas

On 10/4/18 9:26 AM, Remi Forax wrote:

- Mail original -

De: "Richard Hillegas" 
À: "core-libs-dev" 
Envoyé: Jeudi 4 Octobre 2018 18:10:13
Objet: generated code and jigsaw modules
I am looking for advice about how to tighten up module encapsulation
while generating byte code on the fly. I ask this question on behalf of
Apache Derby, a pure-Java relational database whose original code dates
back to Java 1.2. I want to reduce Derby's attack-surface when running
with a module path.

First a little context: A relational database is an interpreter for the
SQL language. It converts SQL queries into byte code which then runs on
a virtual machine embedded in the interpreter. In Derby's case, the
virtual machine is the Java VM and the byte code is simply Java byte
code. That is, a Derby query plan is a class whose byte code is
generated on the fly at run time.

I have converted the Apache Derby codeline into a set of jigsaw modules:
https://issues.apache.org/jira/browse/DERBY-6945. Unfortunately, I had
to punch holes in the encapsulation of the main Derby module so that the
generated query plans could call back into the Derby engine. That is
because, by default, generated query plans load into the catch-all,
unnamed module. Note that all of these generated classes live in a
single package which does not belong to any named module.

1) Is it possible to load generated code into a named module?

2) Alternatively, can someone recommend another approach for preserving
module encapsulation while generating classes on the fly?

I would appreciate any advice or examples which you can recommend.

you can use Lookup.defineClass.

Thanks, Rémi. That gives me something to google up. Cheers!



Thanks,
-Rick

cheers,
Rémi





generated code and jigsaw modules

2018-10-04 Thread Richard Hillegas
I am looking for advice about how to tighten up module encapsulation 
while generating byte code on the fly. I ask this question on behalf of 
Apache Derby, a pure-Java relational database whose original code dates 
back to Java 1.2. I want to reduce Derby's attack-surface when running 
with a module path.


First a little context: A relational database is an interpreter for the 
SQL language. It converts SQL queries into byte code which then runs on 
a virtual machine embedded in the interpreter. In Derby's case, the 
virtual machine is the Java VM and the byte code is simply Java byte 
code. That is, a Derby query plan is a class whose byte code is 
generated on the fly at run time.


I have converted the Apache Derby codeline into a set of jigsaw modules: 
https://issues.apache.org/jira/browse/DERBY-6945. Unfortunately, I had 
to punch holes in the encapsulation of the main Derby module so that the 
generated query plans could call back into the Derby engine. That is 
because, by default, generated query plans load into the catch-all, 
unnamed module. Note that all of these generated classes live in a 
single package which does not belong to any named module.


1) Is it possible to load generated code into a named module?

2) Alternatively, can someone recommend another approach for preserving 
module encapsulation while generating classes on the fly?


I would appreciate any advice or examples which you can recommend.

Thanks,
-Rick



Re: handling the deprecations introduced by early access builds 116 and 118 of jdk 9

2016-06-06 Thread Richard Hillegas

Thanks for that response, Stuart. One comment inline...

On 5/31/16 5:34 PM, Stuart Marks wrote:



On 5/30/16 11:48 AM, Richard Hillegas wrote:
Dalibor Topic recommended that I post this feedback on core-libs-dev. 
This is my
feedback after ameliorating the deprecation warnings which surfaced 
when I
compiled and tested Apache Derby with early access builds 116 and 118 
of JDK 9.
Derby is a pure Java relational database whose original code goes 
back almost 20
years. Other large, old code bases (like Weblogic) may have similar 
experiences.
More detail on my experience can be found on the JIRA issue which 
tracks the

Derby community's attempt to keep our code evergreen against JDK 9:
https://issues.apache.org/jira/browse/DERBY-6856


Hi Rick,

Thanks for your feedback on the API deprecations.

A couple notes on deprecation. First, the deprecation JEP (JEP 277) 
[1] has clarified the definition of deprecation so that by default it 
no longer means that the API will be removed. In the absence of 
forRemoval=true, deprecation is merely a recommendation for code to 
migrate away from the annotated API. Only when the forRemoval=true 
element is present does it mean that the API is actually going to be 
removed. None of these deprecations has forRemoval=true, this means 
that there's no great urgency for anyone to migrate away from them.


Now, they will generate compilation warnings, which is quite possibly 
a problem. There are some existing mechanisms for disabling warnings, 
such as -Xlint:-deprecation and the @SuppressWarnings annotation. 
These might not be sufficient. We're considering adding some 
finer-grained mechanisms. Ideally, for deprecated APIs that aren't 
being removed, it should be possible to manage the warnings so that 
migration of any code base can proceed at whatever pace its 
maintainers feel is appropriate, without it being forced by any 
particular JDK release.
This was the issue which I faced. The Derby community has spent 
considerable effort on maintaining a clean build, one which doesn't 
swamp real error indications in a blizzard of diagnostic noise. At the 
same time, we are reluctant to wholesale-disable all deprecation 
warnings because, in general, they do provide useful advice about best 
practices. The ameliorations you are considering do sound useful. I 
don't have any better suggestions at this time.


Thanks,
-Rick


If you have any thoughts on how to better manage deprecation warnings, 
I'd love to hear them.


o Deprecating autoboxing constructors - Deprecating the autoboxing 
constructors
for primitive wrapper objects caused a large rototill of Derby code. 
That
rototill was comparable in size to the changes made necessary by Java 
5's
introduction of generics. Hopefully, IDEs can automate much of this 
chore.


The boxing constructors -- e.g., new Integer(432) -- are the ones 
being deprecated. The preferred alternative is Integer.valueOf(432). 
Note that *auto*boxing ends calling valueOf() under the covers. 
Autoboxing is generally preferable, although not without pitfalls, 
such as the overloading of List.remove(int) vs List.remove(Object), as 
you stumbled across in the referenced bug report. Using valueOf() 
instead of autoboxing would have avoided the error.


o Deprecating Class.newInstance() - The deprecation of 
Class.newInstance()
forced a similarly large rototill. The code became more verbose. 
Additional
exceptions had to be caught and propagated up the call stack. For 
reasons which
I don't understand, I had better luck using 
Class.getConstructor().newInstance()
than Class.getDeclaredConstructor().newInstance(). But the former 
replacement
code requires you to make constructors public. For some code bases, 
that may
introduce security problems which are worse than the security problem 
being
addressed by this deprecation. I hope that IDEs and the release notes 
for JDK 9

will provide some guidance for how to handle these issues.


It would be good to understand why getDeclaredConstructor() didn't 
work. Clearly requiring a public no-arg constructor is a non-starter.


o Deprecating java.util.Observable and java.util.Observer - Two 
ameliorations

are recommended at
http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-April/040436.html. 
The
first suggestion (use the awt event model) runs very much counter to 
the whole
intent of Jigsaw. That is because pulling in awt can bloat up an 
application
with large, otherwise unneeded libraries. Using awt was out of the 
question for
Derby, given that the community had already invested a great deal of 
effort in
paring back Derby's dependencies in order to let the code run on JDK 
8 compact
profile 2. That left us with the other option: write your own 
replacement
classes. If a lot of people end up having to write the same 
replacement code,
then that argues for leaving this small but useful functionality in 
the JDK. I
think that the people who advocated for this deprecation did not have 
good
visibility

handling the deprecations introduced by early access builds 116 and 118 of jdk 9

2016-05-31 Thread Richard Hillegas
Dalibor Topic recommended that I post this feedback on core-libs-dev. 
This is my feedback after ameliorating the deprecation warnings which 
surfaced when I compiled and tested Apache Derby with early access 
builds 116 and 118 of JDK 9. Derby is a pure Java relational database 
whose original code goes back almost 20 years. Other large, old code 
bases (like Weblogic) may have similar experiences. More detail on my 
experience can be found on the JIRA issue which tracks the Derby 
community's attempt to keep our code evergreen against JDK 9: 
https://issues.apache.org/jira/browse/DERBY-6856


o Deprecating autoboxing constructors - Deprecating the autoboxing 
constructors for primitive wrapper objects caused a large rototill of 
Derby code. That rototill was comparable in size to the changes made 
necessary by Java 5's introduction of generics. Hopefully, IDEs can 
automate much of this chore.


o Deprecating Class.newInstance() - The deprecation of 
Class.newInstance() forced a similarly large rototill. The code became 
more verbose. Additional exceptions had to be caught and propagated up 
the call stack. For reasons which I don't understand, I had better luck 
using Class.getConstructor().newInstance() than 
Class.getDeclaredConstructor().newInstance(). But the former replacement 
code requires you to make constructors public. For some code bases, that 
may introduce security problems which are worse than the security 
problem being addressed by this deprecation. I hope that IDEs and the 
release notes for JDK 9 will provide some guidance for how to handle 
these issues.


o Deprecating java.util.Observable and java.util.Observer - Two 
ameliorations are recommended at 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-April/040436.html. 
The first suggestion (use the awt event model) runs very much counter to 
the whole intent of Jigsaw. That is because pulling in awt can bloat up 
an application with large, otherwise unneeded libraries. Using awt was 
out of the question for Derby, given that the community had already 
invested a great deal of effort in paring back Derby's dependencies in 
order to let the code run on JDK 8 compact profile 2. That left us with 
the other option: write your own replacement classes. If a lot of people 
end up having to write the same replacement code, then that argues for 
leaving this small but useful functionality in the JDK. I think that the 
people who advocated for this deprecation did not have good visibility 
into how widely these classes are being used in the wild. I recommend 
that this deprecation be re-evaluated.


Thanks,
-Rick



Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-06 Thread Richard Hillegas

Hi Rajesh,

The 1.6 schedule is available on the front page of the Spark wiki:
https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage. I don't
know of any workarounds for this problem.

Thanks,
Rick


Madabhattula Rajesh Kumar <mrajaf...@gmail.com> wrote on 11/05/2015
06:35:22 PM:

> From: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: "u...@spark.incubator.apache.org"
> <u...@spark.incubator.apache.org>, "user@spark.apache.org"
> <user@spark.apache.org>
> Date: 11/05/2015 06:35 PM
> Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi Richard,

> Thank you for the updates. Do you know tentative timeline for 1.6
> release? Mean while, any workaround solution for this issue?

> Regards,
> Rajesh
>

>
> On Thu, Nov 5, 2015 at 10:57 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> Or you may be referring to
https://issues.apache.org/jira/browse/SPARK-10648
> . That issue has a couple pull requests but I think that the limited
> bandwidth of the committers still applies.
>
> Thanks,
> Rick
>
>
> Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42 AM:
>
> > From: Richard Hillegas/San Francisco/IBM@IBMUS
> > To: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> > Cc: "user@spark.apache.org" <user@spark.apache.org>,
> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> > Date: 11/05/2015 09:17 AM
> > Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> >
> > Hi Rajesh,
> >
> > I think that you may be referring to https://issues.apache.org/jira/
> > browse/SPARK-10909. A pull request on that issue was submitted more
> > than a month ago but it has not been committed. I think that the
> > committers are busy working on issues which were targeted for 1.6
> > and I doubt that they will have the spare cycles to vet that pull
request.
> >
> > Thanks,
> > Rick
> >
> >
> > Madabhattula Rajesh Kumar <mrajaf...@gmail.com> wrote on 11/05/2015
> > 05:51:29 AM:
> >
> > > From: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> > > To: "user@spark.apache.org" <user@spark.apache.org>,
> > > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> > > Date: 11/05/2015 05:51 AM
> > > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> > >
> > > Hi,
> >
> > > Is this issue fixed in 1.5.1 version?
> >
> > > Regards,
> > > Rajesh

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Richard Hillegas

Or you may be referring to
https://issues.apache.org/jira/browse/SPARK-10648. That issue has a couple
pull requests but I think that the limited bandwidth of the committers
still applies.

Thanks,
Rick


Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>,
> "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> Date: 11/05/2015 09:17 AM
> Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi Rajesh,
>
> I think that you may be referring to https://issues.apache.org/jira/
> browse/SPARK-10909. A pull request on that issue was submitted more
> than a month ago but it has not been committed. I think that the
> committers are busy working on issues which were targeted for 1.6
> and I doubt that they will have the spare cycles to vet that pull
request.
>
> Thanks,
> Rick
>
>
> Madabhattula Rajesh Kumar <mrajaf...@gmail.com> wrote on 11/05/2015
> 05:51:29 AM:
>
> > From: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> > To: "user@spark.apache.org" <user@spark.apache.org>,
> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> > Date: 11/05/2015 05:51 AM
> > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> >
> > Hi,
>
> > Is this issue fixed in 1.5.1 version?
>
> > Regards,
> > Rajesh

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Richard Hillegas

Hi Rajesh,

I think that you may be referring to
https://issues.apache.org/jira/browse/SPARK-10909. A pull request on that
issue was submitted more than a month ago but it has not been committed. I
think that the committers are busy working on issues which were targeted
for 1.6 and I doubt that they will have the spare cycles to vet that pull
request.

Thanks,
Rick


Madabhattula Rajesh Kumar  wrote on 11/05/2015
05:51:29 AM:

> From: Madabhattula Rajesh Kumar 
> To: "user@spark.apache.org" ,
> "u...@spark.incubator.apache.org" 
> Date: 11/05/2015 05:51 AM
> Subject: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi,

> Is this issue fixed in 1.5.1 version?

> Regards,
> Rajesh

Re: Spark scala REPL - Unable to create sqlContext

2015-10-26 Thread Richard Hillegas

Note that embedded Derby supports multiple, simultaneous connections, that
is, multiple simultaneous users. But a Derby database is owned by the
process which boots it. Only one process can boot a Derby database at a
given time. The creation of multiple SQL contexts must be spawning multiple
attempts to boot and own the database. If multiple different processes want
to access the same Derby database simultaneously, then the database should
be booted by the Derby network server. After that, the processes which want
to access the database simultaneously can use the Derby network client
driver, not the Derby embedded driver. For more information, see the Derby
Server and Administration Guide:
http://db.apache.org/derby/docs/10.12/adminguide/index.html

Thanks,
Rick Hillegas



Deenar Toraskar  wrote on 10/25/2015 11:29:54
PM:

> From: Deenar Toraskar 
> To: "Ge, Yao (Y.)" 
> Cc: Ted Yu , user 
> Date: 10/25/2015 11:30 PM
> Subject: Re: Spark scala REPL - Unable to create sqlContext
>
> Embedded Derby, which Hive/Spark SQL uses as the default metastore
> only supports a single user at a time. Till this issue is fixed, you
> could use another metastore that supports multiple concurrent users
> (e.g. networked derby or mysql) to get around it.
>
> On 25 October 2015 at 16:15, Ge, Yao (Y.)  wrote:
> Thanks. I wonder why this is not widely reported in the user forum.
> The RELP shell is basically broken in 1.5 .0 and 1.5.1
> -Yao
>
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Sunday, October 25, 2015 12:01 PM
> To: Ge, Yao (Y.)
> Cc: user
> Subject: Re: Spark scala REPL - Unable to create sqlContext
>
> Have you taken a look at the fix for SPARK-11000 which is in the
> upcoming 1.6.0 release ?
>
> Cheers
>
> On Sun, Oct 25, 2015 at 8:42 AM, Yao  wrote:
> I have not been able to start Spark scala shell since 1.5 as it was not
able
> to create the sqlContext during the startup. It complains the
metastore_db
> is already locked: "Another instance of Derby may have already booted the
> database". The Derby log is attached.
>
> I only have this problem with starting the shell in yarn-client mode. I
am
> working with HDP2.2.6 which runs Hadoop 2.6.
>
> -Yao derby.log
>

>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-scala-REPL-Unable-to-create-sqlContext-
> tp25195.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Can not subscript to mailing list

2015-10-20 Thread Richard Hillegas

Hi Jeff,

Hard to say what's going on. I have had problems subscribing to the Apache
lists in the past. My problems, which may be different than yours, were
caused by replying to the confirmation request from a different email
account than the account I was trying to subscribe from. It was easy for me
to get confused because I was using a single mail tool to manage multiple
email accounts (a personal account, a yahoo account, and a gmail account).
Check your confirmation response to see which email account responded to
the confirmation request.

Hope this helps,
Rick


"jeff.sadow...@gmail.com"  wrote on 10/20/2015
08:48:49 AM:

> From: "jeff.sadow...@gmail.com" 
> To: user@spark.apache.org
> Date: 10/20/2015 08:49 AM
> Subject: Can not subscript to mailing list
>
> I am having issues subscribing to the user@spark.apache.org mailing list.
>
> I would like to be added to the mailing list so I can post some
> configuration questions I have to the list that I do not see asked on the
> list.
>
> When I tried adding myself I got an email titled "confirm subscribe to
> user@spark.apache.org" but after replying as it says to do I get nothing.
I
> tried today to remove and re-add myself and I got a reply back saying I
was
> not on the list when trying to unsubscribe. When I tried to add myself
again
> I don't get any emails from it this time. I'm getting other email from
other
> people and nothing is in spam. I tried with a second email account as
well
> and the same thing is happening on it. I got the initial "confirm
subscribe
> to user@spark.apache.org" email but after replying I get nothing. I can't
> even get another "confirm subscribe to user@spark.apache.org" message.
Both
> of my emails are from google servers one is an organization email the
first
> is a personal google email
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Can-not-subscript-to-mailing-list-tp25143.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Richard Hillegas

As an academic aside, note that all datatypes are nullable according to the
SQL Standard. NOT NULL is modelled in the Standard as a constraint on data
values, not as a parallel universe of special data types. However, very few
databases implement NOT NULL via integrity constraints. Instead, almost all
relational database type systems model NOT NULL as an extra bit of metadata
alongside precision, scale, and length.

Thanks,
Rick Hillegas


Xiao Li  wrote on 10/20/2015 01:17:43 PM:

> From: Xiao Li 
> To: Michael Armbrust 
> Cc: Jerry Lam , "user@spark.apache.org"
> 
> Date: 10/20/2015 01:18 PM
> Subject: Re: Spark SQL: Preserving Dataframe Schema
>
> Sure. Will try to do a pull request this week.
>
> Schema evolution is always painful for database people. IMO, NULL is
> a bad design in the original system R. It introduces a lot of
> problems during the system migration and data integration.
>
> Let me find a possible scenario: RDBMS is used as an ODS. Spark is
> used as an external online data analysis engine. The results could
> be stored in Parquet files and inserted back RDBMS every interval.
> In this case, we could face a few options:
>
> - Change the data types of columns in RDBMS tables to support the
> possible nullable values and the logics of RDBMS applications that
> consume these results must also support NULL. When the applications
> are third-party, changing the applications become harder.
>
> - As what you suggested, before loading the data from the Parquet
> files, we need to add an extra step to do a possible data cleaning,
> value transformation or exception reporting in case of finding NULL.
>
> If having such an external parameter, when writing data schema to
> external data store, Spark will do its best to keep the original
> schema without any change (e.g., keep the initial definition of
> nullability). If some data type/schema conversions are not
> avoidable, it will issue warnings or errors to the users. Does that
> make sense?
>
> Thanks,
>
> Xiao Li
>
>  In this case,
>
> 2015-10-20 12:38 GMT-07:00 Michael Armbrust :
> First, this is not documented in the official document. Maybe we
> should do it?
http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> Pull requests welcome.
>
> Second, nullability is a significant concept in the database people.
> It is part of schema. Extra codes are needed for evaluating if a
> value is null for all the nullable data types. Thus, it might cause
> a problem if you need to use Spark to transfer the data between
> parquet and RDBMS. My suggestion is to introduce another external
parameter?
>
> Sure, but a traditional RDBMS has the opportunity to do validation
> before loading data in.  Thats not really an option when you are
> reading random files from S3.  This is why Hive and many other
> systems in this space treat all columns as nullable.
>
> What would the semantics of this proposed external parameter be?

Re: Network-related environemental problem when running JDBCSuite

2015-10-15 Thread Richard Hillegas

Thanks for everyone's patience with this email thread. I have fixed my
environmental problem and my tests run cleanly now. This seems to be a
problem which afflicts modern JVMs on Mac OSX (and maybe other unix
variants). The following can happen on these platforms:

  InetAddress.getLocalHost().isReachable( 2000 ) == false

If this happens to you, the fix is to add the following line to /etc/hosts:

127.0.0.1   localhost $yourMachineName

where $yourMachineName is the result of the hostname command. For more
information, see
http://stackoverflow.com/questions/1881546/inetaddress-getlocalhost-throws-unknownhostexception

Thanks,
-Rick




Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 11:15:29 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Dev <dev@spark.apache.org>
> Date: 10/15/2015 11:16 AM
> Subject: Re: Network-related environemental problem when running
JDBCSuite

>
> Continuing this lively conversation with myself (hopefully this
> archived thread may be useful to someone else in the future):
>
> I set the following environment variable as recommended by this page:
> http://stackoverflow.com/questions/29906686/failed-to-bind-to-spark-
> master-using-a-remote-cluster-with-two-workers
>
> export SPARK_LOCAL_IP=127.0.0.1
>
> Then I got errors related to booting the metastore_db. So I deleted
> that directory. After that I was able to run spark-shell again.
>
> Now let's see if this hack fixes the tests...
>
>
> Thanks,
> Rick Hillegas
>
>
>
> Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 10:50:55 AM:
>
> > From: Richard Hillegas/San Francisco/IBM@IBMUS
> > To: Richard Hillegas/San Francisco/IBM@IBMUS
> > Cc: Dev <dev@spark.apache.org>
> > Date: 10/15/2015 10:51 AM
> > Subject: Re: Network-related environemental problem when running
JDBCSuite
> >
> > For the record, I get the same error when I simply try to boot the
> > spark shell:
> >
> > bash-3.2$ bin/spark-shell
> > log4j:WARN No appenders could be found for logger
> > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> > log4j:WARN Please initialize the log4j system properly.
> > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> > for more info.
> > Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-
> > repl.properties
> > To adjust logging level use sc.setLogLevel("INFO")
> > Welcome to
> >     __
> >  / __/__  ___ _/ /__
> > _\ \/ _ \/ _ `/ __/  '_/
> >/___/ .__/\_,_/_/ /_/\_\   version 1.6.0-SNAPSHOT
> >   /_/
> >
> > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM,
> Java 1.8.0_60)
> > Type in expressions to have them evaluated.
> > Type :help for more information.
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty transport
> > 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> > on port 0. Attempting port 1.
> > 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> > terminated abrubtly. Attempting to shut down transports
> > 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> > 156:0, shutting down Netty 

Network-related environemental problem when running JDBCSuite

2015-10-15 Thread Richard Hillegas


I am seeing what look like environmental errors when I try to run a test on
a clean local branch which has been sync'd to the head of the development
trunk. I would appreciate advice about how to debug or hack around this
problem. For the record, the test ran cleanly last week. This is the
experiment I am running:

# build
mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean
package

# run one suite
mvn -Dhadoop.version=2.4.0 -DwildcardSuites=JDBCSuite

The test bombs out before getting to JDBCSuite. I see this summary at the
end...

[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS
[  2.023 s]
[INFO] Spark Project Test Tags  SUCCESS
[  1.924 s]
[INFO] Spark Project Launcher . SUCCESS
[  5.837 s]
[INFO] Spark Project Networking ... SUCCESS
[ 12.498 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [01:28
min]
[INFO] Spark Project Unsafe ... SUCCESS [01:09
min]
[INFO] Spark Project Core . SUCCESS [02:45
min]
[INFO] Spark Project Bagel  SUCCESS
[ 30.182 s]
[INFO] Spark Project GraphX ... SUCCESS
[ 59.002 s]
[INFO] Spark Project Streaming  FAILURE [06:21
min]
[INFO] Spark Project Catalyst . SKIPPED
[INFO] Spark Project SQL .. SKIPPED
[INFO] Spark Project ML Library ... SKIPPED
[INFO] Spark Project Tools  SKIPPED
[INFO] Spark Project Hive . SKIPPED
[INFO] Spark Project REPL . SKIPPED
[INFO] Spark Project Assembly . SKIPPED
[INFO] Spark Project External Twitter . SKIPPED
[INFO] Spark Project External Flume Sink .. SKIPPED
[INFO] Spark Project External Flume ... SKIPPED
[INFO] Spark Project External Flume Assembly .. SKIPPED
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project External MQTT Assembly ... SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project External Kafka ... SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project External Kafka Assembly .. SKIPPED
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 13:37 min
[INFO] Finished at: 2015-10-15T09:03:06-07:00
[INFO] Final Memory: 69M/793M
[INFO]

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)
on project spark-streaming_2.10: There are test failures.
[ERROR]
[ERROR] Please refer
to /Users/rhillegas/spark/spark/streaming/target/surefire-reports for the
individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvn  -rf :spark-streaming_2.10



>From the logs in streaming/target/surefire-reports, it appears that the
following tests failed...

org.apache.spark.streaming.JavaAPISuite.txt
org.apache.spark.streaming.JavaReceiverAPISuite.txt

...with this error:

java.net.BindException: Failed to bind to: /9.52.158.156:0: Service
'sparkDriver' failed after 100 retries!
at org.jboss.netty.bootstrap.ServerBootstrap.bind
(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply
(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply
(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch
(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply

Re: SQL Context error in 1.5.1 - any work around ?

2015-10-15 Thread Richard Hillegas
A crude workaround may be to run your spark shell with a sudo command.

Hope this helps,
Rick Hillegas


Sourav Mazumder  wrote on 10/15/2015 09:59:02
AM:

> From: Sourav Mazumder 
> To: user 
> Date: 10/15/2015 09:59 AM
> Subject: SQL Context error in 1.5.1 - any work around ?
>
> I keep on getting this error whenever I'm starting spark-shell : The
> root scratch dir: /tmp/hive on HDFS should be writable. Current
> permissions are: rwx--.

> I cannot work with this if I need to do anything with sqlContext as
> that does not get created.
>
> I could see that a bug is raised for this https://issues.apache.org/
> jira/browse/SPARK-10066.

> However, is there any work around for this.

> I didn't face this problem in 1.4.1

> Regards,
> Sourav

Re: Network-related environemental problem when running JDBCSuite

2015-10-15 Thread Richard Hillegas

Continuing this lively conversation with myself (hopefully this archived
thread may be useful to someone else in the future):

I set the following environment variable as recommended by this page:
http://stackoverflow.com/questions/29906686/failed-to-bind-to-spark-master-using-a-remote-cluster-with-two-workers

export SPARK_LOCAL_IP=127.0.0.1

Then I got errors related to booting the metastore_db. So I deleted that
directory. After that I was able to run spark-shell again.

Now let's see if this hack fixes the tests...


Thanks,
Rick Hillegas



Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 10:50:55 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 10/15/2015 10:51 AM
> Subject: Re: Network-related environemental problem when running
JDBCSuite

>
> For the record, I get the same error when I simply try to boot the
> spark shell:
>
> bash-3.2$ bin/spark-shell
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> for more info.
> Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-
> repl.properties
> To adjust logging level use sc.setLogLevel("INFO")
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.6.0-SNAPSHOT
>   /_/
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_60)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to bind to /9.52.158.
> 156:0, shutting down Netty transport
> 15/10/15 10:49:09 WARN Utils: Service 'sparkDriver' could not bind
> on port 0. Attempting port 1.
> 15/10/15 10:49:09 ERROR Remoting: Remoting system has been
> terminated abrubtly. Attempting to shut down transports
> 15/10/15 10:49:09 ERROR NettyTransport: failed to b

Re: Network-related environemental problem when running JDBCSuite

2015-10-15 Thread Richard Hillegas
scala:1323)
at org.apache.spark.sql.hive.HiveContext.
(HiveContext.scala:100)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance
(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.spark.repl.SparkILoop.createSQLContext
(SparkILoop.scala:1028)
at $iwC$$iwC.(:9)
at $iwC.(:18)
at (:20)
at .(:24)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call
(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun
(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1
(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1
(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith
(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark
$1.apply(SparkILoopInit.scala:132)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark
$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring
(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark
(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark
(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl
$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp
(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks
(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization
(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization
(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl
$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl
$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl
$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader
(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$
$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
$SparkSubmit$$runMain(SparkSubmit.scala:680)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1
(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

:10: error: not found: value sqlContext
   import sqlContext.implicits._
  ^
:10: error: not found: value sqlContext
   import sqlContext.sql

Thanks,
Rick Hillegas



Richard Hillegas/San Francisco/IBM@IBMUS wrote on 10/15/2015 09:47:22 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Dev <dev@spark.apache.org>
> Date: 10/15/2015 09:47 AM
> Subject: Network-related environemental problem when running JDBCSuite
>
> I am seeing what look like environmental errors when I try to run a
> test on a clean local branch which has been sync'd to the head of
> the development trunk. I would appreciate advice about how to debug
> or hack around this problem. For the record, the test ran cleanly
> last week. This is the experiment I am running:
>
> 

Re: pagination spark sq

2015-10-12 Thread Richard Hillegas

Hi Ravi,

If you build Spark with Hive support, then your sqlContext variable will be
an instance of HiveContext and you will enjoy the full capabilities of the
Hive query language rather than the more limited capabilities of Spark SQL.
However, even Hive QL does not support the OFFSET clause, at least
according to the Hive language manual:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual. Hive does
support the LIMIT clause. The following works for me:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val hc = sqlContext

val schema =
  StructType(
StructField("x", IntegerType, nullable=false) ::
StructField("y", DoubleType, nullable=false) :: Nil)

val rdd = sc.parallelize(
  Row(1, 1.0) :: Row(2, 1.34) :: Row(3, 2.3) :: Row(4, 2.5) :: Nil)

val df = hc.createDataFrame(rdd, schema)

df.registerTempTable("test_data")

hc.sql("SELECT * FROM test_data LIMIT 3").show()

exit()


So, to sum up, Hive QL supports a subset of the MySQL LIMIT/OFFSET syntax
(limit, no offset) but does not support the SQL Standard language for
returning a block of rows offset into a large query result.

Hope this helps,
Rick Hillegas



Ravisankar Mani  wrote on 10/12/2015 07:05:05 AM:

> From: Ravisankar Mani 
> To: user@spark.apache.org
> Date: 10/12/2015 07:05 AM
> Subject: pagination spark sq
>
> Hi everyone,
>
> Can you please share optimized query for pagination spark sql?
>

> In Ms SQL Server, They have supported "offset" method query for
> specific row selection.

> Please find the following query

> Select BusinessEntityID,[FirstName], [LastName],[JobTitle]
> from HumanResources.vEmployee
> Order By BusinessEntityID
> --OFFSET 10 ROWS
> FETCH NEXT 10 ROWS ONLY

> Is this support OFFSET method in spark sql? Kindly share the useful
details.

> Regards,
> Ravi

draft release announcement for 10.12.1.1

2015-10-10 Thread Richard Hillegas
Here is a first draft of a release announcement for 10.12.1.1. Please 
let me know how I can improve this.


Thanks,
-Rick

---

The Apache Derby project is pleased to announce feature release 10.12.1.1.

Apache Derby is a subproject of the Apache DB project. Derby is a pure 
Java relational database engine which conforms to the ISO/ANSI SQL and 
JDBC standards. Derby aims to be easy for developers and end-users to 
work with.


Derby 10.12.1.1 can be obtained from the Derby download site:

   http://db.apache.org/derby/derby_downloads.html.

Derby 10.12.1.1 contains the following new features:

* ALTER TABLE and identity columns - The ALTER TABLE command can be 
used to add identity columns now. See the section on this statement in 
the Derby Reference Manual.
* Cache-monitoring MBean - An MBean has been added for monitoring 
internal Derby caches. See the description of CacheManagerMBean in the 
"Introduction to the Derby MBeans" section of the Derby Server and 
Administration Guide.
* Optional Tool for Handling JSON Data - An optional tool has been 
added for packing query results into JSON documents and for unpacking 
JSON documents into tabular result sets. See the section on the 
simpleJson optional tool in the Derby Tools and Utilities Guide.
* Statistics aggregates - SQL Standard VAR_POP(), VAR_SAMP(), 
STDDEV_POP(), and STDDEV_SAMP() aggregates have been added. See the 
"Aggregates (set functions)" section in the Derby Reference Manual.


In addition, Derby 10.12.1.1 contains many bug, security, and 
documentation fixes.


Please try out this new release.



Re: This post has NOT been accepted by the mailing list yet.

2015-10-07 Thread Richard Hillegas

Hi Akhandeshi,

It may be that you are not seeing your own posts because you are sending
from a gmail account. See for instance
https://support.google.com/a/answer/1703601?hl=en

Hope this helps,
Rick Hillegas
STSM, IBM Analytics, Platform - IBM USA


akhandeshi  wrote on 10/07/2015 08:10:32 AM:

> From: akhandeshi 
> To: user@spark.apache.org
> Date: 10/07/2015 08:10 AM
> Subject: This post has NOT been accepted by the mailing list yet.
>
> I seem to see this for many of my posts... does anyone have solution?
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/This-post-has-NOT-been-accepted-by-the-
> mailing-list-yet-tp24969.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

[RESULT] [VOTE] 10.12.1.1 release

2015-10-05 Thread Richard Hillegas
Thanks for everyone's work on coding, documenting, and testing 
10.12.1.1. The polls have closed. The community has approved 10.12.1.1 
as an official Derby release:



+1:
Bryan Pendleton (pmc)
Kim Haase (pmc)
Rick Hillegas (pmc)
Kathey Marsden (pmc)
Myrna van Lunteren (pmc)

No other votes were cast.



Re: save DF to JDBC

2015-10-05 Thread Richard Hillegas

Hi Ruslan,

Here is some sample code which writes a DataFrame to a table in a Derby
database:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val binaryVal = Array[Byte] ( 1, 2, 3, 4 )
val timestampVal = java.sql.Timestamp.valueOf("1996-01-01 03:30:36")
val dateVal = java.sql.Date.valueOf("1996-01-01")

val allTypes = sc.parallelize(
Array(
  (1,
  1.toLong,
  1.toDouble,
  1.toFloat,
  1.toShort,
  1.toByte,
  "true".toBoolean,
  "one ring to rule them all",
  binaryVal,
  timestampVal,
  dateVal,
  BigDecimal.valueOf(42549.12)
  )
)).toDF(
  "int_col",
  "long_col",
  "double_col",
  "float_col",
  "short_col",
  "byte_col",
  "boolean_col",
  "string_col",
  "binary_col",
  "timestamp_col",
  "date_col",
  "decimal_col"
  )

val properties = new java.util.Properties()

allTypes.write.jdbc("jdbc:derby:/Users/rhillegas/derby/databases/derby1",
"all_spark_types", properties)

Hope this helps,

Rick Hillegas
STSM, IBM Analytics, Platform - IBM USA


Ruslan Dautkhanov  wrote on 10/05/2015 02:44:20 PM:

> From: Ruslan Dautkhanov 
> To: user 
> Date: 10/05/2015 02:45 PM
> Subject: save DF to JDBC
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-
> to-other-databases
>
> Spark JDBC can read data from JDBC, but can it save back to JDBC?
> Like to an Oracle database through its jdbc driver.
>
> Also looked at SQL Context documentation
> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/
> SQLContext.html
> and can't find anything relevant.
>
> Thanks!
>
>
> --
> Ruslan Dautkhanov

Re: [VOTE] 10.12.1.1 release

2015-09-30 Thread Richard Hillegas

+1

The platform coverage looks impressive at 
https://wiki.apache.org/db-derby/TenTwelveOnePlatformTesting. I would 
prefer to see more checklist items addressed for the next release, but I 
think we did a good enough job: 
http://wiki.apache.org/db-derby/TenTwelveOneChecklist


Thanks,
-Rick

On 9/20/15 10:14 AM, Richard Hillegas wrote:
Please test-drive the 10.12.1.1 candidate, then vote on whether to 
accept it as a Derby release. The candidate lives at:


  http://people.apache.org/~rhillegas/10.12.1.1/

The polls close at 5:00 pm San Francisco time on Monday, October 5.

10.12.1.1 is a feature release, described in greater detail here: 
https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease


Thanks to everyone who contributed to this release.

Regards,
-Rick






Re: unsubscribe

2015-09-30 Thread Richard Hillegas

Hi Sukesh,

To unsubscribe from the dev list, please send a message to
dev-unsubscr...@spark.apache.org. To unsubscribe from the user list, please
send a message user-unsubscr...@spark.apache.org. Please see:
http://spark.apache.org/community.html#mailing-lists.

Thanks,
-Rick

sukesh kumar  wrote on 09/28/2015 11:39:01 PM:

> From: sukesh kumar 
> To: "u...@spark.apache.org" ,
> "dev@spark.apache.org" 
> Date: 09/28/2015 11:39 PM
> Subject: unsubscribe
>
> unsubscribe
>
> --
> Thanks & Best Regards
> Sukesh Kumar

Re: unsubscribe

2015-09-30 Thread Richard Hillegas

Hi Sukesh,

To unsubscribe from the dev list, please send a message to
dev-unsubscr...@spark.apache.org. To unsubscribe from the user list, please
send a message user-unsubscr...@spark.apache.org. Please see:
http://spark.apache.org/community.html#mailing-lists.

Thanks,
-Rick

sukesh kumar  wrote on 09/28/2015 11:39:01 PM:

> From: sukesh kumar 
> To: "user@spark.apache.org" ,
> "d...@spark.apache.org" 
> Date: 09/28/2015 11:39 PM
> Subject: unsubscribe
>
> unsubscribe
>
> --
> Thanks & Best Regards
> Sukesh Kumar

Re: Derby version used by Hive

2015-09-28 Thread Richard Hillegas
Thanks! Some responses inline...

"kulkarni.swar...@gmail.com" <kulkarni.swar...@gmail.com> wrote on
09/28/2015 10:08:08 AM:

> From: "kulkarni.swar...@gmail.com" <kulkarni.swar...@gmail.com>
> To: "dev@hive.apache.org" <dev@hive.apache.org>
> Date: 09/28/2015 10:08 AM
> Subject: Re: Derby version used by Hive
>
> Richard,
>
> A quick eye-balling of the code doesn't show anything that could
> potentially be a blocker for this upgrade. Also +1 on staying on the
latest
> and greatest. Please feel free to open up a JIRA and submit the patch.

Great! I'll try my hand at this after Derby 10.12.1.1 is published to the
maven repositories next week.

>
> Also just out of curiosity, what are you really using a derby backed
store
> for?

Right now just for testing. Its standards compliance makes it a good
starting point for implementing a portable SQL layer.

Thanks,
-Rick

>
> On Mon, Sep 28, 2015 at 11:02 AM, Richard Hillegas <rhil...@us.ibm.com>
> wrote:
>
> >
> >
> > I haven't received a response to the following message, which I posted
last
> > week. Maybe my message rambled too much. Here is an attempt to pose my
> > question more succinctly:
> >
> > Q: Does anyone know of any reason why we can't upgrade Hive's Derby
version
> > to 10.12.1.1, the new version being vetted by the Derby community right
> > now?
> >
> > Thanks,
> > -Rick
> >
> > > I am following the Hive build instructions here:
> > >
> >
> > https://cwiki.apache.org/confluence/display/Hive/
> GettingStarted#GettingStarted-InstallationandConfiguration
> > > .
> > >
> > > I noticed that Hive development seems to be using an old version of
> > Derby:
> > > 10.10.2.0. Is there some defect in the most recent Derby version
> > > (10.11.1.1) which prevents Hive from upgrading to 10.11.1.1? The only
> > > Hive-tagged Derby bug which I can find is
> > > https://issues.apache.org/jira/browse/DERBY-6358. That issue doesn't
> > seem
> > > to be version-specific and it mentions a resolved Hive issue:
> > > https://issues.apache.org/jira/browse/HIVE-8739.
> > >
> > > Staying with 10.10.2.0 makes sense if you need to run on some ancient
> > JVMs:
> > > Java SE 5 or Java ME CDC/Foundation Profile 1.1. Hadoop, however,
> > requires
> > > at least Java 6 according to
> > > https://wiki.apache.org/hadoop/HadoopJavaVersions.
> > >
> > > Note that the Derby community expects to release version 10.12.1.1
soon:
> > > https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease. This might
be
> > a
> > > good opportunity for Hive to upgrade to a more capable version of
Derby.
> > >
> > > I mention this because the Derby version used by Hive ends up on the
> > > classpath used by downstream projects (like Spark). That makes it
awkward
> > > for downstream projects to use more current Derby versions. Do you
know
> > of
> > > any reason that downstream projects shouldn't override the Derby
version
> > > currently preferred by Hive?
> > >
> > > Thanks,
> > > -Rick
> >
>
>
>
> --
> Swarnim

Re: [Discuss] NOTICE file for transitive "NOTICE"s

2015-09-28 Thread Richard Hillegas
Thanks, Sean!

Sean Owen <so...@cloudera.com> wrote on 09/25/2015 06:35:46 AM:

> From: Sean Owen <so...@cloudera.com>
> To: Reynold Xin <r...@databricks.com>, Richard Hillegas/San
> Francisco/IBM@IBMUS
> Cc: "dev@spark.apache.org" <dev@spark.apache.org>
> Date: 09/25/2015 07:21 PM
> Subject: Re: [Discuss] NOTICE file for transitive "NOTICE"s
>
> Work underway at ...
>
> https://issues.apache.org/jira/browse/SPARK-10833
> https://github.com/apache/spark/pull/8919
>
>
>
> On Fri, Sep 25, 2015 at 8:54 AM, Sean Owen <so...@cloudera.com> wrote:
> > Update: I *think* the conclusion was indeed that nothing needs to
> > happen with NOTICE.
> > However, along the way in
> > https://issues.apache.org/jira/browse/LEGAL-226 it emerged that the
> > BSD/MIT licenses should be inlined into LICENSE (or copied in the
> > distro somewhere). I can get on that -- just some grunt work to copy
> > and paste it all.
> >
> > On Thu, Sep 24, 2015 at 6:55 PM, Reynold Xin <r...@databricks.com>
wrote:
> >> Richard,
> >>
> >> Thanks for bringing this up and this is a great point. Let's start
another
> >> thread for it so we don't hijack the release thread.
> >>
> >>
> >>
> >> On Thu, Sep 24, 2015 at 10:51 AM, Sean Owen <so...@cloudera.com>
wrote:
> >>>
> >>> On Thu, Sep 24, 2015 at 6:45 PM, Richard Hillegas
<rhil...@us.ibm.com>
> >>> wrote:
> >>> > Under your guidance, I would be happy to help compile a NOTICE file
> >>> > which
> >>> > follows the pattern used by Derby and the JDK. This effort might
proceed
> >>> > in
> >>> > parallel with vetting 1.5.1 and could be targeted at a later
release
> >>> > vehicle. I don't think that the ASF's exposure is greatly increased
by
> >>> > one
> >>> > more release which follows the old pattern.
> >>>
> >>> I'd prefer to use the ASF's preferred pattern, no? That's what we've
> >>> been trying to do and seems like we're even required to do so, not
> >>> follow a different convention. There is some specific guidance there
> >>> about what to add, and not add, to these files. Specifically, because
> >>> the AL2 requires downstream projects to embed the contents of NOTICE,
> >>> the guidance is to only include elements in NOTICE that must appear
> >>> there.
> >>>
> >>> Put it this way -- what would you like to change specifically? (you
> >>> can start another thread for that)
> >>>
> >>> >> My assessment (just looked before I saw Sean's email) is the same
as
> >>> >> his. The NOTICE file embeds other projects' licenses.
> >>> >
> >>> > This may be where our perspectives diverge. I did not find those
> >>> > licenses
> >>> > embedded in the NOTICE file. As I see it, the licenses are cited
but not
> >>> > included.
> >>>
> >>> Pretty sure that was meant to say that NOTICE embeds other projects'
> >>> "notices", not licenses. And those notices can have all kinds of
> >>> stuff, including licenses.
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: dev-h...@spark.apache.org
> >>>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

re: Derby version used by Hive

2015-09-28 Thread Richard Hillegas


I haven't received a response to the following message, which I posted last
week. Maybe my message rambled too much. Here is an attempt to pose my
question more succinctly:

Q: Does anyone know of any reason why we can't upgrade Hive's Derby version
to 10.12.1.1, the new version being vetted by the Derby community right
now?

Thanks,
-Rick

> I am following the Hive build instructions here:
>
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration
> .
>
> I noticed that Hive development seems to be using an old version of
Derby:
> 10.10.2.0. Is there some defect in the most recent Derby version
> (10.11.1.1) which prevents Hive from upgrading to 10.11.1.1? The only
> Hive-tagged Derby bug which I can find is
> https://issues.apache.org/jira/browse/DERBY-6358. That issue doesn't seem
> to be version-specific and it mentions a resolved Hive issue:
> https://issues.apache.org/jira/browse/HIVE-8739.
>
> Staying with 10.10.2.0 makes sense if you need to run on some ancient
JVMs:
> Java SE 5 or Java ME CDC/Foundation Profile 1.1. Hadoop, however,
requires
> at least Java 6 according to
> https://wiki.apache.org/hadoop/HadoopJavaVersions.
>
> Note that the Derby community expects to release version 10.12.1.1 soon:
> https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease. This might be
a
> good opportunity for Hive to upgrade to a more capable version of Derby.
>
> I mention this because the Derby version used by Hive ends up on the
> classpath used by downstream projects (like Spark). That makes it awkward
> for downstream projects to use more current Derby versions. Do you know
of
> any reason that downstream projects shouldn't override the Derby version
> currently preferred by Hive?
>
> Thanks,
> -Rick

Re: [Discuss] NOTICE file for transitive "NOTICE"s

2015-09-24 Thread Richard Hillegas

Thanks for forking the new email thread, Reynold. It is entirely possible
that I am being overly skittish. I have posed a question for our legal
experts: https://issues.apache.org/jira/browse/LEGAL-226

To answer Sean's question on the previous email thread, I would propose
making changes like the following to the NOTICE file:

Replace a stanza like this...

"This product contains a modified version of 'JZlib', a re-implementation
of
zlib in pure Java, which can be obtained at:

  * LICENSE:
* license/LICENSE.jzlib.txt (BSD Style License)
  * HOMEPAGE:
* http://www.jcraft.com/jzlib/;

...with full license text like this

"This product contains a modified version of 'JZlib', a re-implementation
of
zlib in pure Java, which can be obtained at:

  * HOMEPAGE:
* http://www.jcraft.com/jzlib/

The ZLIB license text follows:

JZlib 0.0.* were released under the GNU LGPL license.  Later, we have
switched
over to a BSD-style license.

--
Copyright (c) 2000-2011 ymnk, JCraft,Inc. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice,
 this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright
 notice, this list of conditions and the following disclaimer in
 the documentation and/or other materials provided with the
distribution.

  3. The names of the authors may not be used to endorse or promote
products
 derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL JCRAFT,
INC. OR ANY CONTRIBUTORS TO THIS SOFTWARE BE LIABLE FOR ANY DIRECT,
INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."

Thanks,
-Rick



Reynold Xin <r...@databricks.com> wrote on 09/24/2015 10:55:53 AM:

> From: Reynold Xin <r...@databricks.com>
> To: Sean Owen <so...@cloudera.com>
> Cc: Richard Hillegas/San Francisco/IBM@IBMUS, "dev@spark.apache.org"
> <dev@spark.apache.org>
> Date: 09/24/2015 10:56 AM
> Subject: [Discuss] NOTICE file for transitive "NOTICE"s
>
> Richard,
>
> Thanks for bringing this up and this is a great point. Let's start
> another thread for it so we don't hijack the release thread.
>
> On Thu, Sep 24, 2015 at 10:51 AM, Sean Owen <so...@cloudera.com> wrote:
> On Thu, Sep 24, 2015 at 6:45 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> > Under your guidance, I would be happy to help compile a NOTICE file
which
> > follows the pattern used by Derby and the JDK. This effort might
proceed in
> > parallel with vetting 1.5.1 and could be targeted at a later release
> > vehicle. I don't think that the ASF's exposure is greatly increased by
one
> > more release which follows the old pattern.
>
> I'd prefer to use the ASF's preferred pattern, no? That's what we've
> been trying to do and seems like we're even required to do so, not
> follow a different convention. There is some specific guidance there
> about what to add, and not add, to these files. Specifically, because
> the AL2 requires downstream projects to embed the contents of NOTICE,
> the guidance is to only include elements in NOTICE that must appear
> there.
>
> Put it this way -- what would you like to change specifically? (you
> can start another thread for that)
>
> >> My assessment (just looked before I saw Sean's email) is the same as
> >> his. The NOTICE file embeds other projects' licenses.
> >
> > This may be where our perspectives diverge. I did not find those
licenses
> > embedded in the NOTICE file. As I see it, the licenses are cited but
not
> > included.
>
> Pretty sure that was meant to say that NOTICE embeds other projects'
> "notices", not licenses. And those notices can have all kinds of
> stuff, including licenses.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org

Re: [Discuss] NOTICE file for transitive "NOTICE"s

2015-09-24 Thread Richard Hillegas

Thanks for that pointer, Sean. It may be that Derby is putting the license
information in the wrong place, viz. in the NOTICE file. But the 3rd party
license text may need to go somewhere else. See for instance the advice a
little further up the page at
http://www.apache.org/dev/licensing-howto.html#permissive-deps

Thanks,
-Rick

Sean Owen <so...@cloudera.com> wrote on 09/24/2015 12:07:01 PM:

> From: Sean Owen <so...@cloudera.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: "dev@spark.apache.org" <dev@spark.apache.org>
> Date: 09/24/2015 12:08 PM
> Subject: Re: [Discuss] NOTICE file for transitive "NOTICE"s
>
> Have a look at http://www.apache.org/dev/licensing-howto.html#mod-notice
> though, which makes a good point about limiting what goes into NOTICE
> to what is required. That's what makes me think we shouldn't do this.
>
> On Thu, Sep 24, 2015 at 7:24 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> > To answer Sean's question on the previous email thread, I would propose
> > making changes like the following to the NOTICE file:
>

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-24 Thread Richard Hillegas

Hi Sean and Wendell,

I share your concerns about how difficult and important it is to get this
right. I think that the Spark community has compiled a very readable and
well organized NOTICE file. A lot of careful thought went into gathering
together 3rd party projects which share the same license text.

All I can offer is my own experience of having served as a release manager
for a sister Apache project (Derby) over the past ten years. The Derby
NOTICE file recites 3rd party licenses verbatim. This is also the approach
taken by the THIRDPARTYLICENSEREADME.txt in the JDK. I am not a lawyer.
However, I have great respect for the experience and legal sensitivities of
the people who compile that JDK license file.

Under your guidance, I would be happy to help compile a NOTICE file which
follows the pattern used by Derby and the JDK. This effort might proceed in
parallel with vetting 1.5.1 and could be targeted at a later release
vehicle. I don't think that the ASF's exposure is greatly increased by one
more release which follows the old pattern.

Another comment inline...

Patrick Wendell <pwend...@gmail.com> wrote on 09/24/2015 10:24:25 AM:

> From: Patrick Wendell <pwend...@gmail.com>
> To: Sean Owen <so...@cloudera.com>
> Cc: Richard Hillegas/San Francisco/IBM@IBMUS, "dev@spark.apache.org"
> <dev@spark.apache.org>
> Date: 09/24/2015 10:24 AM
> Subject: Re: [VOTE] Release Apache Spark 1.5.1 (RC1)
>
> Hey Richard,
>
> My assessment (just looked before I saw Sean's email) is the same as
> his. The NOTICE file embeds other projects' licenses.

This may be where our perspectives diverge. I did not find those licenses
embedded in the NOTICE file. As I see it, the licenses are cited but not
included.

Thanks,
-Rick


> If those
> licenses themselves have pointers to other files or dependencies, we
> don't embed them. I think this is standard practice.
>
> - Patrick
>
> On Thu, Sep 24, 2015 at 10:00 AM, Sean Owen <so...@cloudera.com> wrote:
> > Hi Richard, those are messages reproduced from other projects' NOTICE
> > files, not created by Spark. They need to be reproduced in Spark's
> > NOTICE file to comply with the license, but their text may or may not
> > apply to Spark's distribution. The intent is that users would track
> > this back to the source project if interested to investigate what the
> > upstream notice is about.
> >
> > Requirements vary by license, but I do not believe there is additional
> > requirement to reproduce these other files. Their license information
> > is already indicated in accordance with the license terms.
> >
> > What licenses are you looking for in LICENSE that you believe
> should be there?
> >
> > Getting all this right is both difficult and important. I've made some
> > efforts over time to strictly comply with the Apache take on
> > licensing, which is at http://www.apache.org/legal/resolved.html  It's
> > entirely possible there's still a mistake somewhere in here (possibly
> > a new dependency, etc). Please point it out if you see such a thing.
> >
> > But so far what you describe is "working as intended", as far as I
> > know, according to Apache.
> >
> >
> > On Thu, Sep 24, 2015 at 5:52 PM, Richard Hillegas
> <rhil...@us.ibm.com> wrote:
> >> -1 (non-binding)
> >>
> >> I was able to build Spark cleanly from the source distribution using
the
> >> command in README.md:
> >>
> >> build/mvn -DskipTests clean package
> >>
> >> However, while I was waiting for the build to complete, I started
going
> >> through the NOTICE file. I was confused about where to find
> licenses for 3rd
> >> party software bundled with Spark. About halfway through the NOTICE
file,
> >> starting with Java Collections Framework, there is a list of
> licenses of the
> >> form
> >>
> >>license/*.txt
> >>
> >> But there is no license subdirectory in the source distro. I couldn't
find
> >> the  *.txt license files for Java Collections Framework, Base64
Encoder, or
> >> JZlib anywhere in the source distro. I couldn't find those files in
license
> >> subdirectories at the indicated home pages for those projects. (I did
find
> >> the license for JZLIB somewhere else, however:
> >> http://www.jcraft.com/jzlib/LICENSE.txt.)
> >>
> >> In addition, I couldn't find licenses for those projects in the master
> >> LICENSE file.
> >>
> >> Are users supposed to get licenses from the indicated 3rd party web
sites?
> >> Those online licenses could change. I would feel more comfortableif
the ASF
> >> w

Re: [VOTE] Release Apache Spark 1.5.1 (RC1)

2015-09-24 Thread Richard Hillegas

-1 (non-binding)

I was able to build Spark cleanly from the source distribution using the
command in README.md:

build/mvn -DskipTests clean package

However, while I was waiting for the build to complete, I started going
through the NOTICE file. I was confused about where to find licenses for
3rd party software bundled with Spark. About halfway through the NOTICE
file, starting with Java Collections Framework, there is a list of licenses
of the form

   license/*.txt

But there is no license subdirectory in the source distro. I couldn't find
the  *.txt license files for Java Collections Framework, Base64 Encoder, or
JZlib anywhere in the source distro. I couldn't find those files in license
subdirectories at the indicated home pages for those projects. (I did find
the license for JZLIB somewhere else, however:
http://www.jcraft.com/jzlib/LICENSE.txt.)

In addition, I couldn't find licenses for those projects in the master
LICENSE file.

Are users supposed to get licenses from the indicated 3rd party web sites?
Those online licenses could change. I would feel more comfortable if the
ASF were protected by our bundling the licenses inside our source distros.

After looking for those three licenses, I stopped reading the NOTICE file.
Maybe I'm confused about how to read the NOTICE file. Where should users
expect to find the 3rd party licenses?

Thanks,
-Rick

Reynold Xin  wrote on 09/24/2015 12:27:25 AM:

> From: Reynold Xin 
> To: "dev@spark.apache.org" 
> Date: 09/24/2015 12:28 AM
> Subject: [VOTE] Release Apache Spark 1.5.1 (RC1)
>
> Please vote on releasing the following candidate as Apache Spark
> version 1.5.1. The vote is open until Sun, Sep 27, 2015 at 10:00 UTC
> and passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.5.1
> [ ] -1 Do not release this package because ...
>
> The release fixes 81 known issues in Spark 1.5.0, listed here:
> http://s.apache.org/spark-1.5.1
>
> The tag to be voted on is v1.5.1-rc1:
> https://github.com/apache/spark/commit/
> 4df97937dbf68a9868de58408b9be0bf87dbbb94
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release (1.5.1) can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1148/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-docs/
>
> ===
> How can I help test this release?
> ===
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate,
> then reporting any regressions.
>
> 
> What justifies a -1 vote for this release?
> 
> -1 vote should occur for regressions from Spark 1.5.0. Bugs already
> present in 1.5.0 will not block this release.
>
> ===
> What should happen to JIRA tickets still targeting 1.5.1?
> ===
> Please target 1.5.2 or 1.6.0.

Re: unsubscribe

2015-09-23 Thread Richard Hillegas

Hi Ntale,

To unsubscribe from the user list, please send a message to
user-unsubscr...@spark.apache.org as described here:
http://spark.apache.org/community.html#mailing-lists.

Thanks,
-Rick

Ntale Lukama  wrote on 09/23/2015 04:34:48 AM:

> From: Ntale Lukama 
> To: user 
> Date: 09/23/2015 04:35 AM
> Subject: unsubscribe

Derby version used by Hive

2015-09-23 Thread Richard Hillegas


I am following the Hive build instructions here:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration
.

I noticed that Hive development seems to be using an old version of Derby:
10.10.2.0. Is there some defect in the most recent Derby version
(10.11.1.1) which prevents Hive from upgrading to 10.11.1.1? The only
Hive-tagged Derby bug which I can find is
https://issues.apache.org/jira/browse/DERBY-6358. That issue doesn't seem
to be version-specific and it mentions a resolved Hive issue:
https://issues.apache.org/jira/browse/HIVE-8739.

Staying with 10.10.2.0 makes sense if you need to run on some ancient JVMs:
Java SE 5 or Java ME CDC/Foundation Profile 1.1. Hadoop, however, requires
at least Java 6 according to
https://wiki.apache.org/hadoop/HadoopJavaVersions.

Note that the Derby community expects to release version 10.12.1.1 soon:
https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease. This might be a
good opportunity for Hive to upgrade to a more capable version of Derby.

I mention this because the Derby version used by Hive ends up on the
classpath used by downstream projects (like Spark). That makes it awkward
for downstream projects to use more current Derby versions. Do you know of
any reason that downstream projects shouldn't override the Derby version
currently preferred by Hive?

Thanks,
-Rick

column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas


I am puzzled by the behavior of column identifiers in Spark SQL. I don't
find any guidance in the "Spark SQL and DataFrame Guide" at
http://spark.apache.org/docs/latest/sql-programming-guide.html. I am seeing
odd behavior related to case-sensitivity and to delimited (quoted)
identifiers.

Consider the following declaration of a table in the Derby relational
database, whose dialect hews closely to the SQL Standard:

   create table app.t( a int, "b" int, "c""d" int );

Now let's load that table into Spark like this:

  import org.apache.spark.sql._
  import org.apache.spark.sql.types._

  val df = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1",
"dbtable" -> "app.t")).load()
  df.registerTempTable("test_data")

The following query runs fine because the column name matches the
normalized form in which it is stored in the metadata catalogs of the
relational database:

  // normalized column names are recognized
  sqlContext.sql(s"""select A from test_data""").show

But the following query fails during name resolution. This puzzles me
because non-delimited identifiers are case-insensitive in the ANSI/ISO
Standard. They are also supposed to be case-insensitive in HiveQL, at least
according to section 2.3.1 of the QuotedIdentifier.html webpage attached to
https://issues.apache.org/jira/browse/HIVE-6013:

  // ...unnormalized column names raise this error:
org.apache.spark.sql.AnalysisException: cannot resolve 'a' given input
columns A, b, c"d;
  sqlContext.sql("""select a from test_data""").show

Delimited (quoted) identifiers are treated as string literals. Again,
non-Standard behavior:

  // this returns rows consisting of the string literal "b"
  sqlContext.sql("""select "b" from test_data""").show

Embedded quotes in delimited identifiers won't even parse:

  // embedded quotes raise this error: java.lang.RuntimeException: [1.11]
failure: ``union'' expected but "d" found
  sqlContext.sql("""select "c""d" from test_data""").show

This behavior is non-Standard and it strikes me as hard to describe to
users concisely. Would the community support an effort to bring the
handling of column identifiers into closer conformance with the Standard?
Would backward compatibility concerns even allow us to do that?

Thanks,
-Rick

Derby version in Spark

2015-09-22 Thread Richard Hillegas


I see that lib_managed/jars holds these old Derby versions:

  lib_managed/jars/derby-10.10.1.1.jar
  lib_managed/jars/derby-10.10.2.0.jar

The Derby 10.10 release family supports some ancient JVMs: Java SE 5 and
Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone running
Spark on the resource-constrained Java ME platform. Is Spark really
deployed on Java SE 5? Is there some other reason that Spark uses the 10.10
Derby family?

If no-one needs those ancient JVMs, maybe we could consider changing the
Derby version to 10.11.1.1 or even to the upcoming 10.12.1.1 release (both
run on Java 6 and up).

Thanks,
-Rick

Re: column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas

Thanks for that tip, Michael. I think that my sqlContext was a raw
SQLContext originally. I have rebuilt Spark like so...

  sbt/sbt -Phive assembly/assembly

Now I see that my sqlContext is a HiveContext. That fixes one of the
queries. Now unnormalized column names work:

  // ...unnormalized column names work now
  sqlContext.sql("""select a from test_data""").show

However, quoted identifiers are still treated as string literals:

  // this still returns rows consisting of the string literal "b"
  sqlContext.sql("""select "b" from test_data""").show

And embedded quotes inside quoted identifiers are swallowed up:

  // this now returns rows consisting of the string literal "cd"
  sqlContext.sql("""select "c""d" from test_data""").show

Thanks,
-Rick

Michael Armbrust <mich...@databricks.com> wrote on 09/22/2015 10:58:36 AM:

> From: Michael Armbrust <mich...@databricks.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 09/22/2015 10:59 AM
> Subject: Re: column identifiers in Spark SQL
>
> Are you using a SQLContext or a HiveContext?  The programming guide
> suggests the latter, as the former is really only there because some
> applications may have conflicts with Hive dependencies.  SQLContext
> is case sensitive by default where as the HiveContext is not.  The
> parser in HiveContext is also a lot better.
>
> On Tue, Sep 22, 2015 at 10:53 AM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> I am puzzled by the behavior of column identifiers in Spark SQL. I
> don't find any guidance in the "Spark SQL and DataFrame Guide" at
> http://spark.apache.org/docs/latest/sql-programming-guide.html. I am
> seeing odd behavior related to case-sensitivity and to delimited
> (quoted) identifiers.
>
> Consider the following declaration of a table in the Derby
> relational database, whose dialect hews closely to the SQL Standard:
>
>    create table app.t( a int, "b" int, "c""d" int );
>
> Now let's load that table into Spark like this:
>
>   import org.apache.spark.sql._
>   import org.apache.spark.sql.types._
>
>   val df = sqlContext.read.format("jdbc").options(
>     Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1",
>     "dbtable" -> "app.t")).load()
>   df.registerTempTable("test_data")
>
> The following query runs fine because the column name matches the
> normalized form in which it is stored in the metadata catalogs of
> the relational database:
>
>   // normalized column names are recognized
>   sqlContext.sql(s"""select A from test_data""").show
>
> But the following query fails during name resolution. This puzzles
> me because non-delimited identifiers are case-insensitive in the
> ANSI/ISO Standard. They are also supposed to be case-insensitive in
> HiveQL, at least according to section 2.3.1 of the
> QuotedIdentifier.html webpage attached to https://issues.apache.org/
> jira/browse/HIVE-6013:
>
>   // ...unnormalized column names raise this error:
> org.apache.spark.sql.AnalysisException: cannot resolve 'a' given
> input columns A, b, c"d;
>   sqlContext.sql("""select a from test_data""").show
>
> Delimited (quoted) identifiers are treated as string literals.
> Again, non-Standard behavior:
>
>   // this returns rows consisting of the string literal "b"
>   sqlContext.sql("""select "b" from test_data""").show
>
> Embedded quotes in delimited identifiers won't even parse:
>
>   // embedded quotes raise this error: java.lang.RuntimeException:
> [1.11] failure: ``union'' expected but "d" found
>   sqlContext.sql("""select "c""d" from test_data""").show
>
> This behavior is non-Standard and it strikes me as hard to describe
> to users concisely. Would the community support an effort to bring
> the handling of column identifiers into closer conformance with the
> Standard? Would backward compatibility concerns even allow us to do that?
>
> Thanks,
> -Rick

Re: Derby version in Spark

2015-09-22 Thread Richard Hillegas

Thanks, Ted. I'll follow up with the Hive folks.

Cheers,
-Rick

Ted Yu <yuzhih...@gmail.com> wrote on 09/22/2015 03:41:12 PM:

> From: Ted Yu <yuzhih...@gmail.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 09/22/2015 03:41 PM
> Subject: Re: Derby version in Spark
>
> I cloned Hive 1.2 code base and saw:
>
>     10.10.2.0
>
> So the version used by Spark is quite close to what Hive uses.
>
> On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> I see.
> I use maven to build so I observe different contents under
> lib_managed directory.
>
> Here is snippet of dependency tree:
>
> [INFO] |  +-
org.spark-project.hive:hive-metastore:jar:1.2.1.spark:compile
> [INFO] |  |  +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
> [INFO] |  |  +- org.apache.derby:derby:jar:10.10.1.1:compile
>
> On Tue, Sep 22, 2015 at 3:21 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> Thanks, Ted. I'm working on my master branch. The lib_managed/jars
> directory has a lot of jarballs, including hadoop and hive. Maybe
> these were faulted in when I built with the following command?
>
>   sbt/sbt -Phive assembly/assembly
>
> The Derby jars seem to be used in order to manage the metastore_db
> database. Maybe my question should be directed to the Hive community?
>
> Thanks,
> -Rick
>
> Here are the gory details:
>
> bash-3.2$ ls lib_managed/jars
> FastInfoset-1.2.12.jar curator-test-2.4.0.jar jersey-test-framework-
> grizzly2-1.9.jar parquet-format-2.3.0-incubating.jar
> JavaEWAH-0.3.2.jar datanucleus-api-jdo-3.2.6.jar jets3t-0.7.1.jar
> parquet-generator-1.7.0.jar
> ST4-4.0.4.jar datanucleus-core-3.2.10.jar jetty-continuation-8.1.
> 14.v20131031.jar parquet-hadoop-1.7.0.jar
> activation-1.1.jar datanucleus-rdbms-3.2.9.jar jetty-http-8.1.
> 14.v20131031.jar parquet-hadoop-bundle-1.6.0.jar
> akka-actor_2.10-2.3.11.jar derby-10.10.1.1.jar jetty-io-8.1.
> 14.v20131031.jar parquet-jackson-1.7.0.jar
> akka-remote_2.10-2.3.11.jar derby-10.10.2.0.jar jetty-jndi-8.1.
> 14.v20131031.jar platform-3.4.0.jar
> akka-slf4j_2.10-2.3.11.jar genjavadoc-plugin_2.10.4-0.9-spark0.jar
> jetty-plus-8.1.14.v20131031.jar pmml-agent-1.1.15.jar
> akka-testkit_2.10-2.3.11.jar groovy-all-2.1.6.jar jetty-security-8.
> 1.14.v20131031.jar pmml-model-1.1.15.jar
> antlr-2.7.7.jar guava-11.0.2.jar jetty-server-8.1.14.v20131031.jar
> pmml-schema-1.1.15.jar
> antlr-runtime-3.4.jar guice-3.0.jar jetty-servlet-8.1.
> 14.v20131031.jar postgresql-9.3-1102-jdbc41.jar
> aopalliance-1.0.jar h2-1.4.183.jar jetty-util-6.1.26.jar py4j-0.8.2.1.jar
> arpack_combined_all-0.1-javadoc.jar hadoop-annotations-2.2.0.jar
> jetty-util-8.1.14.v20131031.jar pyrolite-4.4.jar
> arpack_combined_all-0.1.jar hadoop-auth-2.2.0.jar jetty-webapp-8.1.
> 14.v20131031.jar quasiquotes_2.10-2.0.0.jar
> asm-3.2.jar hadoop-client-2.2.0.jar jetty-websocket-8.1.
> 14.v20131031.jar reflectasm-1.07-shaded.jar
> avro-1.7.4.jar hadoop-common-2.2.0.jar jetty-xml-8.1.
> 14.v20131031.jar sac-1.3.jar
> avro-1.7.7.jar hadoop-hdfs-2.2.0.jar jline-0.9.94.jar scala-
> compiler-2.10.0.jar
> avro-ipc-1.7.7-tests.jar hadoop-mapreduce-client-app-2.2.0.jar
> jline-2.10.4.jar scala-compiler-2.10.4.jar
> avro-ipc-1.7.7.jar hadoop-mapreduce-client-common-2.2.0.jar jline-2.
> 12.jar scala-library-2.10.4.jar
> avro-mapred-1.7.7-hadoop2.jar hadoop-mapreduce-client-core-2.2.0.jar
> jna-3.4.0.jar scala-reflect-2.10.4.jar
> breeze-macros_2.10-0.11.2.jar hadoop-mapreduce-client-jobclient-2.2.
> 0.jar joda-time-2.5.jar scalacheck_2.10-1.11.3.jar
> breeze_2.10-0.11.2.jar hadoop-mapreduce-client-shuffle-2.2.0.jar
> jodd-core-3.5.2.jar scalap-2.10.0.jar
> calcite-avatica-1.2.0-incubating.jar hadoop-yarn-api-2.2.0.jar
> json-20080701.jar selenium-api-2.42.2.jar
> calcite-core-1.2.0-incubating.jar hadoop-yarn-client-2.2.0.jar
> json-20090211.jar selenium-chrome-driver-2.42.2.jar
> calcite-linq4j-1.2.0-incubating.jar hadoop-yarn-common-2.2.0.jar
> json4s-ast_2.10-3.2.10.jar selenium-firefox-driver-2.42.2.jar
> cglib-2.2.1-v20090111.jar hadoop-yarn-server-common-2.2.0.jar
> json4s-core_2.10-3.2.10.jar selenium-htmlunit-driver-2.42.2.jar
> cglib-nodep-2.1_3.jar hadoop-yarn-server-nodemanager-2.2.0.jar
> json4s-jackson_2.10-3.2.10.jar selenium-ie-driver-2.42.2.jar
> chill-java-0.5.0.jar hamcrest-core-1.1.jar jsr173_api-1.0.jar
> selenium-java-2.42.2.jar
> chill_2.10-0.5.0.jar hamcrest-core-1.3.jar jsr305-1.3.9.jar
> selenium-remote-driver-2.42.2.jar
> commons-beanutils-1.7.0.jar hamcrest-library-1.3.jar jsr305-2.0.
> 1.jar selenium-safari-driver-2.42.2.jar
> commons-beanutils-core-1.8.0.jar hive-exec-1.2.1.spark.jar jta-1.
> 1.jar selenium-support-2.42.2.jar
&

Re: column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas

Thanks for that additional tip, Michael. Backticks fix the problem query in
which an identifier was transformed into a string literal. So this works
now...

  // now correctly resolves the unnormalized column id
  sqlContext.sql("""select `b` from test_data""").show

Any suggestion about how to escape an embedded double quote?

  // java.sql.SQLSyntaxErrorException: Syntax error: Encountered "\"" at
line 1, column 12.
  sqlContext.sql("""select `c"d` from test_data""").show

  // org.apache.spark.sql.AnalysisException: cannot resolve 'c\"d' given
input columns A, b, c"d; line 1 pos 7
  sqlContext.sql("""select `c\"d` from test_data""").show

Thanks,
-Rick

Michael Armbrust <mich...@databricks.com> wrote on 09/22/2015 01:16:12 PM:

> From: Michael Armbrust <mich...@databricks.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 09/22/2015 01:16 PM
> Subject: Re: column identifiers in Spark SQL
>
> HiveQL uses `backticks` for quoted identifiers.
>
> On Tue, Sep 22, 2015 at 1:06 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> Thanks for that tip, Michael. I think that my sqlContext was a raw
> SQLContext originally. I have rebuilt Spark like so...
>
>   sbt/sbt -Phive assembly/assembly
>
> Now I see that my sqlContext is a HiveContext. That fixes one of the
> queries. Now unnormalized column names work:
>
>   // ...unnormalized column names work now
>   sqlContext.sql("""select a from test_data""").show
>
> However, quoted identifiers are still treated as string literals:
>
>   // this still returns rows consisting of the string literal "b"
>   sqlContext.sql("""select "b" from test_data""").show
>
> And embedded quotes inside quoted identifiers are swallowed up:
>
>   // this now returns rows consisting of the string literal "cd"
>   sqlContext.sql("""select "c""d" from test_data""").show
>
> Thanks,
> -Rick
>
> Michael Armbrust <mich...@databricks.com> wrote on 09/22/2015 10:58:36
AM:
>
> > From: Michael Armbrust <mich...@databricks.com>
> > To: Richard Hillegas/San Francisco/IBM@IBMUS
> > Cc: Dev <dev@spark.apache.org>
> > Date: 09/22/2015 10:59 AM
> > Subject: Re: column identifiers in Spark SQL
>
> >
> > Are you using a SQLContext or a HiveContext?  The programming guide
> > suggests the latter, as the former is really only there because some
> > applications may have conflicts with Hive dependencies.  SQLContext
> > is case sensitive by default where as the HiveContext is not.  The
> > parser in HiveContext is also a lot better.
> >
> > On Tue, Sep 22, 2015 at 10:53 AM, Richard Hillegas <rhil...@us.ibm.com
> > wrote:
> > I am puzzled by the behavior of column identifiers in Spark SQL. I
> > don't find any guidance in the "Spark SQL and DataFrame Guide" at
> > http://spark.apache.org/docs/latest/sql-programming-guide.html. I am
> > seeing odd behavior related to case-sensitivity and to delimited
> > (quoted) identifiers.
> >
> > Consider the following declaration of a table in the Derby
> > relational database, whose dialect hews closely to the SQL Standard:
> >
> >    create table app.t( a int, "b" int, "c""d" int );
> >
> > Now let's load that table into Spark like this:
> >
> >   import org.apache.spark.sql._
> >   import org.apache.spark.sql.types._
> >
> >   val df = sqlContext.read.format("jdbc").options(
> >     Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1",
> >     "dbtable" -> "app.t")).load()
> >   df.registerTempTable("test_data")
> >
> > The following query runs fine because the column name matches the
> > normalized form in which it is stored in the metadata catalogs of
> > the relational database:
> >
> >   // normalized column names are recognized
> >   sqlContext.sql(s"""select A from test_data""").show
> >
> > But the following query fails during name resolution. This puzzles
> > me because non-delimited identifiers are case-insensitive in the
> > ANSI/ISO Standard. They are also supposed to be case-insensitive in
> > HiveQL, at least according to section 2.3.1 of the
> > QuotedIdentifier.html webpage attached to https://issues.apache.org/
> > jira/browse/HIVE-6013:
> >
> >   // 

Re: Derby version in Spark

2015-09-22 Thread Richard Hillegas
.jar  htmlunit-2.14.jar
jul-to-slf4j-1.7.10.jar slf4j-api-1.7.10.jar
commons-codec-1.4.jar   htmlunit-core-js-2.14.jar
junit-4.10.jar  slf4j-log4j12-1.7.10.jar
commons-codec-1.5.jar   httpclient-4.3.2.jar
junit-dep-4.10.jar  snappy-0.2.jar
commons-codec-1.9.jar   httpcore-4.3.1.jar
junit-dep-4.8.2.jar 
spire-macros_2.10-0.7.4.jar
commons-collections-3.2.1.jar   httpmime-4.3.2.jar
junit-interface-0.10.jarspire_2.10-0.7.4.jar
commons-compiler-2.7.8.jar  istack-commons-runtime-2.16.jar
junit-interface-0.9.jar 
stax-api-1.0.1.jar
commons-compress-1.4.1.jar  ivy-2.4.0.jar
libfb303-0.9.2.jar  stream-2.7.0.jar
commons-configuration-1.6.jar   jackson-core-asl-1.8.8.jar
libthrift-0.9.2.jar stringtemplate-3.2.1.jar
commons-dbcp-1.4.jarjackson-core-asl-1.9.13.jar
lz4-1.3.0.jar   tachyon-client-0.7.1.jar
commons-digester-1.8.jarjackson-jaxrs-1.8.8.jar
mesos-0.21.1-shaded-protobuf.jar
tachyon-underfs-hdfs-0.7.1.jar
commons-exec-1.1.jarjackson-mapper-asl-1.9.13.jar
minlog-1.2.jar
tachyon-underfs-local-0.7.1.jar
commons-httpclient-3.1.jar  jackson-xc-1.8.8.jar
mockito-core-1.9.5.jar  test-interface-0.5.jar
commons-io-2.1.jar  janino-2.7.8.jar
mysql-connector-java-5.1.34.jar test-interface-1.0.jar
commons-io-2.4.jar  jansi-1.4.jar
nekohtml-1.9.20.jar 
uncommons-maths-1.2.2a.jar
commons-lang-2.5.jarjavassist-3.15.0-GA.jar
netty-all-4.0.29.Final.jar  unused-1.0.0.jar
commons-lang-2.6.jarjavax.inject-1.jar
objenesis-1.0.jar   webbit-0.4.14.jar
commons-lang3-3.3.2.jar jaxb-api-2.2.2.jar
objenesis-1.2.jar   xalan-2.7.1.jar
commons-logging-1.1.3.jar   jaxb-api-2.2.7.jar
opencsv-2.3.jar xercesImpl-2.11.0.jar
commons-math-2.1.jarjaxb-core-2.2.7.jar
oro-2.0.8.jar   xml-apis-1.4.01.jar
commons-math-2.2.jarjaxb-impl-2.2.3-1.jar
paranamer-2.3.jar   xmlenc-0.52.jar
commons-math3-3.4.1.jar jaxb-impl-2.2.7.jar
paranamer-2.6.jar   xz-1.0.jar
commons-net-3.1.jar jblas-1.2.4.jar
parquet-avro-1.7.0.jar  zookeeper-3.4.5.jar
commons-pool-1.5.4.jar  jcl-over-slf4j-1.7.10.jar
parquet-column-1.7.0.jar
core-1.1.2.jar  jdo-api-3.0.1.jar
parquet-common-1.7.0.jar
cssparser-0.9.13.jarjersey-guice-1.9.jar
parquet-encoding-1.7.0.jar

Ted Yu <yuzhih...@gmail.com> wrote on 09/22/2015 01:32:39 PM:

> From: Ted Yu <yuzhih...@gmail.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: Dev <dev@spark.apache.org>
> Date: 09/22/2015 01:33 PM
> Subject: Re: Derby version in Spark
>
> Which Spark release are you building ?
>
> For master branch, I get the following:
>
> lib_managed/jars/datanucleus-api-jdo-3.2.6.jar  lib_managed/jars/
> datanucleus-core-3.2.10.jar  lib_managed/jars/datanucleus-rdbms-3.2.9.jar
>
> FYI
>
> On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> I see that lib_managed/jars holds these old Derby versions:
>
>   lib_managed/jars/derby-10.10.1.1.jar
>   lib_managed/jars/derby-10.10.2.0.jar
>
> The Derby 10.10 release family supports some ancient JVMs: Java SE 5
> and Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone
> running Spark on the resource-constrained Java ME platform. Is Spark
> really deployed on Java SE 5? Is there some other reason that Spark
> uses the 10.10 Derby family?
>
> If no-one needs those ancient JVMs, maybe we could consider changing
> the Derby version to 10.11.1.1 or even to the upcoming 10.12.1.1
> release (both run on Java 6 and up).
>
> Thanks,
> -Rick

Re: Count for select not matching count for group by

2015-09-21 Thread Richard Hillegas
For what it's worth, I get the expected result that "filter" behaves like
"group by" when I run the same experiment against a DataFrame which was
loaded from a relational store:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val df = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1",
  "dbtable" -> "app.outcomes")).load()

df.select("OUTCOME").groupBy("OUTCOME").count.show
#
# returns:
#
# +---+-+
# |OUTCOME|count|
# +---+-+
# |  A|  128|
# |  B|  256|
# +---+-+

df.filter("OUTCOME = 'A'").count
#
# returns:
#
# res1: Long = 128


df.registerTempTable("test_data")
sqlContext.sql("select OUTCOME, count( OUTCOME ) from test_data group by
OUTCOME").show
#
# returns:
#
# +---+---+
# |OUTCOME|_c1|
# +---+---+
# |  A|128|
# |  B|256|
# +---+---+

Thanks,
-Rick

Michael Kelly  wrote on 09/21/2015 08:06:29
AM:

> From: Michael Kelly 
> To: user@spark.apache.org
> Date: 09/21/2015 08:08 AM
> Subject: Count for select not matching count for group by
>
> Hi,
>
> I'm seeing some strange behaviour with spark 1.5, I have a dataframe
> that I have built from loading and joining some hive tables stored in
> s3.
>
> The dataframe is cached in memory, using df.cache.
>
> What I'm seeing is that the counts I get when I do a group by on a
> column are different from what I get when I filter/select and count.
>
> df.select("outcome").groupBy("outcome").count.show
> outcome | count
> --
> 'A'   |  100
> 'B'   |  200
>
> df.filter("outcome = 'A'").count
> # 50
>
> df.filter(df("outcome") === "A").count
> # 50
>
> I expect the count of columns that match 'A' in the groupBy to match
> the count when filtering. Any ideas what might be happening?
>
> Thanks,
>
> Michael
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Unsubscribe

2015-09-21 Thread Richard Hillegas

To unsubscribe from the dev list, please send a message to
dev-unsubscr...@spark.apache.org as described here:
http://spark.apache.org/community.html#mailing-lists.

Thanks,
-Rick

Dulaj Viduranga  wrote on 09/21/2015 10:15:58 AM:

> From: Dulaj Viduranga 
> To: dev@spark.apache.org
> Date: 09/21/2015 10:16 AM
> Subject: Unsubscribe
>
> Unsubscribe
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

[VOTE] 10.12.1.1 release

2015-09-20 Thread Richard Hillegas
Please test-drive the 10.12.1.1 candidate, then vote on whether to 
accept it as a Derby release. The candidate lives at:


  http://people.apache.org/~rhillegas/10.12.1.1/

The polls close at 5:00 pm San Francisco time on Monday, October 5.

10.12.1.1 is a feature release, described in greater detail here: 
https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease


Thanks to everyone who contributed to this release.

Regards,
-Rick



Re: unsubscribe

2015-09-18 Thread Richard Hillegas

To unsubscribe from the user list, please send a message to
user-unsubscr...@spark.apache.org as described here:
http://spark.apache.org/community.html#mailing-lists.

Thanks,
-Rick

[VOTE] 10.12.1.0 release

2015-09-14 Thread Richard Hillegas
Please test-drive the 10.12.1.0 candidate, then vote on whether to 
accept it as a Derby release. The candidate lives at:


  http://people.apache.org/~rhillegas/10.12.1.0/

The polls close at 5:00 pm San Francisco time on Monday, October 5.

10.12.1.0 is a feature release, described in greater detail here: 
https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease


Thanks to everyone who contributed to this release.

Regards,
-Rick



Re: [VOTE] 10.12.1.0 release

2015-09-14 Thread Richard Hillegas

-1

When building the docs from the source distribution, the copyright year 
is wrong (2014) and the version number is wrong (10.11).


But please continue to go through the release checklist at 
http://wiki.apache.org/db-derby/TenTwelveOneChecklist and record test 
results at http://wiki.apache.org/db-derby/TenTwelveOnePlatformTesting


I plan to fix the defects which we identify and then generate a new 
release candidate next weekend.


Thanks,
-Rick

On 9/14/15 6:58 AM, Richard Hillegas wrote:
Please test-drive the 10.12.1.0 candidate, then vote on whether to 
accept it as a Derby release. The candidate lives at:


  http://people.apache.org/~rhillegas/10.12.1.0/

The polls close at 5:00 pm San Francisco time on Monday, October 5.

10.12.1.0 is a feature release, described in greater detail here: 
https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease


Thanks to everyone who contributed to this release.

Regards,
-Rick






[RESULT] [VOTE] Sunsetting support for Java 7

2015-09-13 Thread Richard Hillegas
The polls have closed. The developer community has approved the proposal 
to sunset support for Java 7.


+1 votes:

Rick Hillegas (PMC)
Bryan Pendleton (PMC)
Myrna van Lunteren (PMC)
Gary Gregory
Mike Matrigali (PMC)
Kim Haase (PMC)
Kristian Waagan (PMC)
Knut Anders Hatlen (PMC)


No other votes were cast.



need a build job for the new 10.12 branch

2015-09-13 Thread Richard Hillegas
I've just created the 10.12 branch. Could someone create a build job for 
the new branch? This is step 2 of the branch creation instructions: 
http://wiki.apache.org/db-derby/CreatingDerbyBranch


Thanks,
-Rick


Re: Is there any Spark SQL reference manual?

2015-09-11 Thread Richard Hillegas

The latest Derby SQL Reference manual (version 10.11) can be found here:
https://db.apache.org/derby/docs/10.11/ref/index.html. It is, indeed, very
useful to have a comprehensive reference guide. The Derby build scripts can
also produce a BNF description of the grammar--but that is not part of the
public documentation for the project. The BNF is trivial to generate
because it is an artifact of the JavaCC grammar generator which Derby uses.

I appreciate the difficulty of maintaining a formal reference guide for a
rapidly evolving SQL dialect like Spark's.

A machine-generated BNF, however, is easy to imagine. But perhaps not so
easy to implement. Spark's SQL grammar is implemented in Scala, extending
the DSL support provided by the Scala language. I am new to programming in
Scala, so I don't know whether the Scala ecosystem provides any good tools
for reverse-engineering a BNF from a class which extends
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

Thanks,
-Rick

vivekw...@gmail.com wrote on 09/11/2015 05:05:47 AM:

> From: vivek bhaskar 
> To: Ted Yu 
> Cc: user 
> Date: 09/11/2015 05:06 AM
> Subject: Re: Is there any Spark SQL reference manual?
> Sent by: vivekw...@gmail.com
>
> Hi Ted,
>
> The link you mention do not have complete list of supported syntax.
> For example, few supported syntax are listed as "Supported Hive
> features" but that do not claim to be exhaustive (even if it is so,
> one has to filter out a lot many lines from Hive QL reference and
> still will not be sure if its all - due to versions mismatch).
>
> Quickly searching online gives me link for another popular open
> source project which has good sql reference: https://db.apache.org/
> derby/docs/10.1/ref/crefsqlj23296.html.
>
> I had similar expectation when I was looking for all supported DDL
> and DML syntax along with their extensions. For example,
> a. Select expression along with supported extensions i.e. where
> clause, group by, different supported joins etc.
> b. SQL format for Create, Insert, Alter table etc.
> c. SQL for Insert, Update, Delete, etc along with their extensions.
> d. Syntax for view creation, if supported
> e. Syntax for explain mechanism
> f. List of supported functions, operators, etc. I can see that 100s
> of function are added in 1.5 but then you have to make lot of cross
> check from code to JIRA tickets.
>
> So I wanted a piece of documentation that can provide all such
> information at a single place.
>
> Regards,
> Vivek
>
> On Fri, Sep 11, 2015 at 4:29 PM, Ted Yu  wrote:
> You may have seen this:
> https://spark.apache.org/docs/latest/sql-programming-guide.html
>
> Please suggest what should be added.
>
> Cheers
>
> On Fri, Sep 11, 2015 at 3:43 AM, vivek bhaskar 
wrote:
> Hi all,
>
> I am looking for a reference manual for Spark SQL some thing like
> many database vendors have. I could find one for hive ql https://
> cwiki.apache.org/confluence/display/Hive/LanguageManual but not
> anything specific to spark sql.
>
> Please suggest. SQL reference specific to latest release will be of
> great help.
>
> Regards,
> Vivek

[VOTE] Sunsetting support for Java 7

2015-09-06 Thread Richard Hillegas
Please vote on the following proposed policy for supported platforms. 
The polls close at 5:00 pm San Francisco time on Saturday September 12.


A) The 10.12 release notes will tell users that 10.12 is the last 
release which supports Java 7. Note that we have already agreed to 
sunset support for Java 6.


B) The 10.13 release will support Java 9 and 8 as well as Java 8 compact 
profile 2. After releasing 10.12, the development trunk will no longer 
support Java 7.


C) We expect that maintenance releases on a branch will continue to 
support the same Java versions as the initial feature release cut from 
that branch. We will document this on the wiki.


Adopting this policy would result in the following changes to the 10.13 
trunk:


I) Removing build support for Java 7.

II) Purging user doc references to Java 7.

We do not anticipate that this policy will require any changes to user code.



Re: [VOTE] Sunsetting support for Java 7

2015-09-06 Thread Richard Hillegas

+1

On 9/5/15 8:02 AM, Richard Hillegas wrote:
Please vote on the following proposed policy for supported platforms. 
The polls close at 5:00 pm San Francisco time on Saturday September 12.


A) The 10.12 release notes will tell users that 10.12 is the last 
release which supports Java 7. Note that we have already agreed to 
sunset support for Java 6.


B) The 10.13 release will support Java 9 and 8 as well as Java 8 
compact profile 2. After releasing 10.12, the development trunk will 
no longer support Java 7.


C) We expect that maintenance releases on a branch will continue to 
support the same Java versions as the initial feature release cut from 
that branch. We will document this on the wiki.


Adopting this policy would result in the following changes to the 
10.13 trunk:


I) Removing build support for Java 7.

II) Purging user doc references to Java 7.

We do not anticipate that this policy will require any changes to user 
code.







derby wiki and build/test environments

2005-07-15 Thread Richard Hillegas
Is there a Derby wiki for development issues? The derby mats tests are
hanging for me and I was hoping to rummage for clues somewhere.

This is what I'm seeing: the xaStateTran tests run to conclusion
successfully but hang on exit--and this freezes the overall suite. I'm
running on the following VM:

 Java HotSpot(TM) Server VM (build 1.5.0_03-b07, mixed mode)

Any help would be appreciated.

Thanks,
-Rick



please add me as a derby developer

2005-07-14 Thread Richard Hillegas
Hello,

Please add me as a derby developer.

Thanks,
-Rick



Re: please add me as a derby developer

2005-07-14 Thread Richard Hillegas
Hi Satheesh,

Thanks. I have just created a Jira account. The userid is rhillegas. I
have also subscribed to the derby users and developers mailing lists.
What else do I need to do to become a derby developer?

Thanks,
-Rick


Satheesh Bandaram wrote:
 Hi Rick,
 
 Thanks for your interest in Derby... Do you have a Jira account? If so,
 what is it? You would need to create a Jira account first, if you don't
 have one.
 
 You would need to be part of 'derby-developers' if you have plans to
 work on bugs or enhancements to Derby.
 
 Satheesh
 
 Richard Hillegas wrote:
 
 
Hello,

Please add me as a derby developer.

Thanks,
-Rick



 

 
 



Re: please add me as a derby developer

2005-07-14 Thread Richard Hillegas
Hi Satheesh,

Yes, please. I want to start assigning myself bugs so that I can fix them.

Cheers,
-Rick

Satheesh Bandaram wrote:
 Hi Rick,
 
 I can grant the access to 'derby-developers' to you... The main reason
 for wanting this access is if you have intent to work on fixing Derby
 bugs or enhancements... Let me know if you have this intent. (and I hope
 you do :-) )
 
 Here is some info on Jira and how to work with bugs:
 http://incubator.apache.org/derby/DerbyBugGuidelines.html
 
 Satheesh
 
 Richard Hillegas wrote:
 
Hi Satheesh,

Thanks. I have just created a Jira account. The userid is rhillegas. I
have also subscribed to the derby users and developers mailing lists.
What else do I need to do to become a derby developer?

Thanks,
-Rick


Satheesh Bandaram wrote:
  

Hi Rick,

Thanks for your interest in Derby... Do you have a Jira account? If so,
what is it? You would need to create a Jira account first, if you don't
have one.

You would need to be part of 'derby-developers' if you have plans to
work on bugs or enhancements to Derby.

Satheesh

Richard Hillegas wrote:




Hello,

Please add me as a derby developer.

Thanks,
-Rick