[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-10 Thread Roberta Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999105#comment-14999105
 ] 

Roberta Marton commented on TRAFODION-1578:
---

I am just coming up to speed on this proposal.  Part of this is trying to 
understand how things will work.  Let's start with the new "upload" command.  
Let me know if I understand things correctly. 

It looks like you are proposing a new system SPJ that will upload JAR files to 
be accessible by Trafodion for procedures, functions, and table-mapping 
functions and maybe more things in the future.  To do this, you will be adding 
a new system SPJ that will accept a JAR file, an associated .md5 file, the type 
of JAR, and perhaps the CREATE text.  The upload command will check the JAR 
file against the .md5 file, it will create a library specifying some predefined 
location, create the function/procedure/table-mapping functions so it is part 
of the Trafodion metadata.  Then it will store the JAR as a LOB and/or copy it 
out to some predefined location on each node in the cluster. 
  
General comments and questions on the DDL aspect:

- This requires us to create a new system library and SPJ.  Today, we don't 
have any system SPJs.  So there will be upgrade considerations to create a 
schema such as "_SPJ_", create the library, and load the SPJ.  Have you 
considered how we will be adding this infrastructure?
- We can, at upgrade/install time, create the upload SPJ.  As mentioned before, 
we probably want to create a system schema along with this.  I assume  that 
DB__ROOT will own the schema which, in effect, makes DB__ROOT owner of the 
library and system SPJ and that initially only DB__ROOT can execute the upload 
SPJ. We can use existing grant commands to give all the required permission to 
other roles/users.  Do we need to add any additional privileges to these 
commands?  Should we allow anyone granted DB__ROOTROLE permission run this 
command also?
- Since the upload command performs SQL statements, we need to be concerned 
about who can execute it.  Today we only support invoker rights so anyone who 
can execute the command has to be trusted.  We should probably consider definer 
rights; that is allow people to run the upload command only and not grant them 
the world.  
- When you upload a jar, you need to add it to SQL metadata through a CREATE 
FUNCTION or similar command.  Will each new JAR belong to a different library 
or will a set of JAR's belong to the same library? If you combine JAR together 
in a single library, how will this work?
- while we are designing this, we should remember that we claim to be SQL ANSI 
compliant.  So we do need to make sure we follow their guidelines.  Is anything 
you are proposing differ much from ANSI?
- In your proposal "add a record in metadata table in case users upload the 
same JAR many times", it looks like you want to keep a record on how many times 
a JAR is uploaded.  Will this information be stored in a repository table or in 
the system metadata?  Again there are upgrade considerations.  I like the idea 
of keeping track of usages.  We might want to add some rules that specify how 
often and frequently a JAR is uploaded to perhaps detect security issues.  Of 
course, we want to associate the real user with the upload operation.
- will you be able to upload JAR through both sqlci and trafci?

This is an interesting proposal and am looking forward to helping out with it.  
There is a lot to be considered so we want to make sure we cover everything.  
Will you be writing up a design specification for this?  It would be helpful 
since there are a lot of conversations and a single document codify them in one 
place.

I will comment later on the overall security of things once I understand things 
better.

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. 

[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-09 Thread Selvaganesan Govindarajan (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996729#comment-14996729
 ] 

Selvaganesan Govindarajan commented on TRAFODION-1578:
--

Is this proposal to reuse pre-started JVMs by different master processes to 
execute UDRs/UDFs  instead of currently running mxudr per master process or to 
start new JVMs to deploy SPJs on different nodes on the cluster? I tried to 
decipher this from the discussions but in vain. Could you please clarify it?

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-08 Thread Hans Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995562#comment-14995562
 ] 

Hans Zeller commented on TRAFODION-1578:


Some comments on two points made in the description:

??3. DCSServer start a JVM for the user. User can modify JVM options, program 
properties and JAVA classpath. At the same time, a monitor class will be 
starting in the JVM witch will register a node on Zookeeper for this JVM as 
well as metadata info( process id, server info and so on) and the node will be 
removed while JVM exiting. It allows customer to specify JVM idle time in case 
of some realtime senarior like Kafka consumer.??

As mentioned in the previous comment, this would work for trusted UDRs only. 
For isolated UDRs, we need a separate process. Also, I don't think we want the 
user to modify JVM options, that is the job of the DBA and would be a security 
problem. Third, there is only one JVM for a process, so the JVM of DCSServer 
would be used for the Trafodion engine as well as for trusted UDRs. The JVM 
gets started before a UDR gets invoked (e.g. for reading metadata), so when we 
invoke a UDR it's too late anyway to change the JVM parameters. Finally, we 
already have logic to reuse DCSServers, do we really need additional logic for 
that type of reuse?

??4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
long in use; Restart JVMs with latest JARs and so on.??

The DCS web GUI already lists all the DCSServers. Do we need another command in 
Trafci? Again, regular users should not be able to kill DCSServers, that's the 
job of a DBA.

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-08 Thread Kevin Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995948#comment-14995948
 ] 

Kevin Xu commented on TRAFODION-1578:
-

For muti-thread DCS, it's DCSMaster->DCSServer->jdbcT2. For SPJ, it's 
DCSMaster->DCSServer->UDR->jdbcT2(JAVA-> C++ -> JAVA, why not JAVA -> JAVA). 
it's not necessary to create a new JVM for each connection. But for single JVM, 
it should load the JARs dynamicly as what UDR is doing. Contains SQL is trying 
to reuse JVM, right? As i mentioned, there is a customized idle timeout( 
contains non-stop) for standing by. 
yes, the users can modify JVM options directly, but it will have limitation on 
customized options like max heap size of JVM, the number of connections for one 
user. And also, allowed to start/restart. i'd like to do an experiment on a new 
command 'agent'.

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-08 Thread Hans Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996155#comment-14996155
 ] 

Hans Zeller commented on TRAFODION-1578:


The JVMs used for the Trafodion server serve mainly the Trafodion engine and 
they need to execute multiple UDRs, since we don't know which UDRs will be 
executed when we start the master executor:

1. User connecs. A JVM is reused or started for the Trafodion engine. We don't 
know yet which UDRs the user will execute.
2. User calls SPJ a. At this time the JVM is already running.
3. User calls TMUDF b. Now we have to reuse the same JVM, if we don't use an 
MXUDR server.

Are you saying that a user should specify at connect time how much heap size 
they want to have in the JVM used in their connection? We could probably do a 
limited set of options like that.

Often it's probably the creator of a UDR, not the user of it, who may want to 
specify heap size and other parameters of the JVM used to execute a UDR. End 
users may not want to have to worry about that. Having a separate JVM for the 
UDRs should help avoid conflicts between Trafodion and UDRs.

Thanks for your patience and your explanations, I hope I understand your 
proposal to use sandboxing with a SecurityManager and setting JVM parameters a 
bit better and it makes more sense to me now. I still have questions, though, 
see comment on the thread above, about the SecurityManager being too 
restrictive for many UDRs.

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-08 Thread Hans Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995560#comment-14995560
 ] 

Hans Zeller commented on TRAFODION-1578:


To add a bit more info about the security issues with UDRs (User-defined 
Routines):

There are trusted and isolated UDRs: Trusted UDRs run in the Trafodion engine 
and can bypass any security rules like GRANT/REVOKE. Therefore, only trusted 
users can write such UDRs. Isolated UDRs run in a separate process under a 
different user id and should not be able to break security rules.

Right now, Trafodion has neither. UDRs run in a separate process, but that 
process runs under the Trafodion id. That will need to change and we can also 
implement trusted UDRs.

Another issue is spoofing. We need to make sure that an attacker can't execute 
arbitrary code through the UDR mechanism. Our plan to solve that is by doing 
the following:

- Require that UDR libraries reside in a directory that is controlled by 
Trafodion, such that users other than the Trafodion id can't directly add or 
modify them.
- Require a special privilege to create trusted and another (lesser) privilege 
to create isolated libraries. Creating a library involves taking user code in 
form of a file in HDFS, a URL, or a local file on the server. Venkat is 
planning to add a GUI function to provide a client file as the code of a 
library.
- Have one or more special user ids that execute UDRs, don't run the MXUDR 
process under the Trafodion id as is done today.

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-08 Thread Kevin Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996143#comment-14996143
 ] 

Kevin Xu commented on TRAFODION-1578:
-

One JVM(process) per Trafodion user by default. User knows how much heap size 
they need well than us. By the way, it may set limitations on each JVM and only 
some authorized options can be modified. i'm not saying that it'll have same 
JVM with DCSServer. DCSServer will create a JVM with a customized IDLE 
timeout(user may make decision that how long it is existing, maybe never stop). 
As i said, the new path will be DCSMaster->DCSServer->create new JVM for 
current user if not exist. Nothing about UDR.

Yes, regular users dont have permission to kill DCSServers. They have 
permission to start/stop SPJ JVM( one for each user by default).

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-08 Thread Kevin Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996115#comment-14996115
 ] 

Kevin Xu commented on TRAFODION-1578:
-

For authority, the checking should be done in DCSServer( check whether current 
user has privilege to call) by querying metadata tables. One JVM(process) per 
user by default. 
For arbitrary code, JVM SecurityManager should be helpful. 
  1. FilePermission in SecurityManager should be helpful.
  2. It should be solved if started JVM only has specified folder priviledge. 
Eventually, the JARs should be existing in the running node and it's fine to 
copy from HDFS at the first time. It's very good to hear that Venkat will add a 
GUI function.
  3. If JVM behaves are under control, it's not necessary to run a process with 
a special user ids(one user id per connection? if not, it will impact each 
other. Otherwise, you will have many user ids to be created.)
  
Refs: https://en.wikipedia.org/wiki/Java_security#Security_manager
  
http://docs.oracle.com/javase/7/docs/technotes/guides/security/overview/jsoverview.html
 

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-08 Thread Hans Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996145#comment-14996145
 ] 

Hans Zeller commented on TRAFODION-1578:


Thanks for the explanation and the link, yesterday I didn't realize that the 
SecurityManager can also be used to sandbox a piece of Java code. I could 
imagine that this could work for simple functions, especially for scalar UDFs 
that usually do simple, localized operations. I don't think it would work for 
TMUDFs. It might work for SPJs that do only Trafodion SQL and other simple 
operations.

What about Java UDRs that want to access HDFS or HBase or other JDBC drivers? 
What about UDRs that call other code like Spark, Kafka or MongoDB?

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-04 Thread Suresh Subbiah (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990060#comment-14990060
 ] 

Suresh Subbiah commented on TRAFODION-1578:
---

The JAR perform process is slightly different for CALL statements as described 
above.

Currently what we have is DcsMaster->DcsServer->mxosrvr (these three 
associations occur when the trafci, JDBC (T4), ODBC  app connects)
When the first CALL statement is made from this connection an MXUDR process is 
associated with mxosrvr. mxosrvr acts as master executor for the CALL 
statement. MXUDR process will host the jvm that will execute the SPJ. If the 
SPJ contains SQL, then the mxudr process itself acts as a master executor (with 
JDBC T2). Mxudr processes can be reused between different statements in the 
same connection. In this case the same JVM will be used.

In general JVM management (for those used by SQL queries) is done by the 
Trafodion engine. Is the proposal here suggesting that users may want to 
control that more directly?

Thank you.

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-03 Thread Venkat Muthuswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987928#comment-14987928
 ] 

Venkat Muthuswamy commented on TRAFODION-1578:
--

1. I suggest that we keep the interface generic not just for JARs. It should 
allow to deploy UDF files too.

2. Storing SPJs in user specific directories can be a problem. You cannot share 
it with other users. Often times, a SPJ developer or admin will deploy the SPJs 
and other users/application will call the SPJ.  So limiting access to the user 
who deployed it will not work. The security concerns can be addressed using the 
SQL privileges on the Library object which encapsulates the jar or udf file. We 
had a similar design in the pre-apache product and we had to revert to using a 
single directory or store because it became too cumbersome to manage between 
roles and users and public.

3. We should keep in mind the ODBC buffer limits and the size of the jar. The 
client (TrafCI) here will have to send the jar contents in chunks and the SPJ 
has to assemble them into the single LOB or HDFS file.

4. I have a security concern about the additional commands being proposed: 
"list all JVMs in user; kill one of them that no long in use; Restart JVMs with 
latest JARs and so on". How will you enforce security here. Consider leveraging 
and extending the Trafodion REST server for these kind of manageability 
functions. These features do not seem to belong in TrafCI.

5. Make sure the proposed interfaces prevent any Denial of Service attacks and 
adds necessary checks to limit size of the file etc..

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management

2015-11-03 Thread liu ming (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987369#comment-14987369
 ] 

liu ming commented on TRAFODION-1578:
-

I like this idea very much. It allows user to deploy SPJ via a simple trafci 
command, since itself is a SPJ, so it is also very easy to integrated into a 
GUI manager.

Need to discuss a little more:
1.  Put DLL/jar as LOB or as HDFS file?
   HDFS file seems very good to me.
2.  Location of DLL/jar at server side. 
   $MY_SQROOT/export/lib/udr/cache is a good place, and as Kevin proposed, 
there are benefit to put DLL/jar files in directories per user. So we can have 
finer control, user A cannot call user B’s SPJ for example. So maybe we can 
upload the jar to $MY_SQROOT/export/lib/udr/cache/$USER

If the SPJ/UDR need extra DLL/jar files, we can  simply put into 
$MY_SQROOT/export/lib along with the SPJ jar, $MY_SQROOT/export/lib is already 
in CLASSPATH and  LD_LIBRARY_PATH. Or we can ask the developer of UDR/SPJ to 
pack all required DLL/jar into a single file. 

> Proposal for SPJ management
> ---
>
> Key: TRAFODION-1578
> URL: https://issues.apache.org/jira/browse/TRAFODION-1578
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: connectivity-dcs
>Reporter: Kevin Xu
>
> JAR upload process:
> 1. Initialize JAR upload procedure by default
> 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and 
> create library will be done here. And also, you can only upload the JARs by 
> UPLOAD command on Trafci that it will not create a lib.
>Tips: Before put the JAR into HDFS check MD5 first, if the file exists, 
> only add a record in metadata table in case users upload the same JAR many 
> times on platform.
> 3. On server-side, the JAR will store in HDFS. At the same time JAR 
> metadata(path in HDFS, MD5 of the file, and others) stores in store procedure 
> metadata table.
> 4. create procedure is the same as now.
> JAR perform process:
> 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET.
> 2. DCSMaster assign a DCSServer for the CALL.
> 3. DCSServer start a JVM for the user. User can modify JVM options, program 
> properties and JAVA classpath. At the same time, a monitor class will be 
> starting in the JVM witch will register a node on Zookeeper for this JVM as 
> well as metadata info( process id, server info and so on) and the node will 
> be removed while JVM exiting. It allows customer to specify JVM idle time in 
> case of some realtime senarior like Kafka consumer. 
> 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no 
> long in use; Restart JVMs with latest JARs and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)