[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999105#comment-14999105 ] Roberta Marton commented on TRAFODION-1578: --- I am just coming up to speed on this proposal. Part of this is trying to understand how things will work. Let's start with the new "upload" command. Let me know if I understand things correctly. It looks like you are proposing a new system SPJ that will upload JAR files to be accessible by Trafodion for procedures, functions, and table-mapping functions and maybe more things in the future. To do this, you will be adding a new system SPJ that will accept a JAR file, an associated .md5 file, the type of JAR, and perhaps the CREATE text. The upload command will check the JAR file against the .md5 file, it will create a library specifying some predefined location, create the function/procedure/table-mapping functions so it is part of the Trafodion metadata. Then it will store the JAR as a LOB and/or copy it out to some predefined location on each node in the cluster. General comments and questions on the DDL aspect: - This requires us to create a new system library and SPJ. Today, we don't have any system SPJs. So there will be upgrade considerations to create a schema such as "_SPJ_", create the library, and load the SPJ. Have you considered how we will be adding this infrastructure? - We can, at upgrade/install time, create the upload SPJ. As mentioned before, we probably want to create a system schema along with this. I assume that DB__ROOT will own the schema which, in effect, makes DB__ROOT owner of the library and system SPJ and that initially only DB__ROOT can execute the upload SPJ. We can use existing grant commands to give all the required permission to other roles/users. Do we need to add any additional privileges to these commands? Should we allow anyone granted DB__ROOTROLE permission run this command also? - Since the upload command performs SQL statements, we need to be concerned about who can execute it. Today we only support invoker rights so anyone who can execute the command has to be trusted. We should probably consider definer rights; that is allow people to run the upload command only and not grant them the world. - When you upload a jar, you need to add it to SQL metadata through a CREATE FUNCTION or similar command. Will each new JAR belong to a different library or will a set of JAR's belong to the same library? If you combine JAR together in a single library, how will this work? - while we are designing this, we should remember that we claim to be SQL ANSI compliant. So we do need to make sure we follow their guidelines. Is anything you are proposing differ much from ANSI? - In your proposal "add a record in metadata table in case users upload the same JAR many times", it looks like you want to keep a record on how many times a JAR is uploaded. Will this information be stored in a repository table or in the system metadata? Again there are upgrade considerations. I like the idea of keeping track of usages. We might want to add some rules that specify how often and frequently a JAR is uploaded to perhaps detect security issues. Of course, we want to associate the real user with the upload operation. - will you be able to upload JAR through both sqlci and trafci? This is an interesting proposal and am looking forward to helping out with it. There is a lot to be considered so we want to make sure we cover everything. Will you be writing up a design specification for this? It would be helpful since there are a lot of conversations and a single document codify them in one place. I will comment later on the overall security of things once I understand things better. > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3.
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996729#comment-14996729 ] Selvaganesan Govindarajan commented on TRAFODION-1578: -- Is this proposal to reuse pre-started JVMs by different master processes to execute UDRs/UDFs instead of currently running mxudr per master process or to start new JVMs to deploy SPJs on different nodes on the cluster? I tried to decipher this from the discussions but in vain. Could you please clarify it? > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995562#comment-14995562 ] Hans Zeller commented on TRAFODION-1578: Some comments on two points made in the description: ??3. DCSServer start a JVM for the user. User can modify JVM options, program properties and JAVA classpath. At the same time, a monitor class will be starting in the JVM witch will register a node on Zookeeper for this JVM as well as metadata info( process id, server info and so on) and the node will be removed while JVM exiting. It allows customer to specify JVM idle time in case of some realtime senarior like Kafka consumer.?? As mentioned in the previous comment, this would work for trusted UDRs only. For isolated UDRs, we need a separate process. Also, I don't think we want the user to modify JVM options, that is the job of the DBA and would be a security problem. Third, there is only one JVM for a process, so the JVM of DCSServer would be used for the Trafodion engine as well as for trusted UDRs. The JVM gets started before a UDR gets invoked (e.g. for reading metadata), so when we invoke a UDR it's too late anyway to change the JVM parameters. Finally, we already have logic to reuse DCSServers, do we really need additional logic for that type of reuse? ??4. Useful commands on Trafci: list all JVMs in user; kill one of them that no long in use; Restart JVMs with latest JARs and so on.?? The DCS web GUI already lists all the DCSServers. Do we need another command in Trafci? Again, regular users should not be able to kill DCSServers, that's the job of a DBA. > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995948#comment-14995948 ] Kevin Xu commented on TRAFODION-1578: - For muti-thread DCS, it's DCSMaster->DCSServer->jdbcT2. For SPJ, it's DCSMaster->DCSServer->UDR->jdbcT2(JAVA-> C++ -> JAVA, why not JAVA -> JAVA). it's not necessary to create a new JVM for each connection. But for single JVM, it should load the JARs dynamicly as what UDR is doing. Contains SQL is trying to reuse JVM, right? As i mentioned, there is a customized idle timeout( contains non-stop) for standing by. yes, the users can modify JVM options directly, but it will have limitation on customized options like max heap size of JVM, the number of connections for one user. And also, allowed to start/restart. i'd like to do an experiment on a new command 'agent'. > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996155#comment-14996155 ] Hans Zeller commented on TRAFODION-1578: The JVMs used for the Trafodion server serve mainly the Trafodion engine and they need to execute multiple UDRs, since we don't know which UDRs will be executed when we start the master executor: 1. User connecs. A JVM is reused or started for the Trafodion engine. We don't know yet which UDRs the user will execute. 2. User calls SPJ a. At this time the JVM is already running. 3. User calls TMUDF b. Now we have to reuse the same JVM, if we don't use an MXUDR server. Are you saying that a user should specify at connect time how much heap size they want to have in the JVM used in their connection? We could probably do a limited set of options like that. Often it's probably the creator of a UDR, not the user of it, who may want to specify heap size and other parameters of the JVM used to execute a UDR. End users may not want to have to worry about that. Having a separate JVM for the UDRs should help avoid conflicts between Trafodion and UDRs. Thanks for your patience and your explanations, I hope I understand your proposal to use sandboxing with a SecurityManager and setting JVM parameters a bit better and it makes more sense to me now. I still have questions, though, see comment on the thread above, about the SecurityManager being too restrictive for many UDRs. > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995560#comment-14995560 ] Hans Zeller commented on TRAFODION-1578: To add a bit more info about the security issues with UDRs (User-defined Routines): There are trusted and isolated UDRs: Trusted UDRs run in the Trafodion engine and can bypass any security rules like GRANT/REVOKE. Therefore, only trusted users can write such UDRs. Isolated UDRs run in a separate process under a different user id and should not be able to break security rules. Right now, Trafodion has neither. UDRs run in a separate process, but that process runs under the Trafodion id. That will need to change and we can also implement trusted UDRs. Another issue is spoofing. We need to make sure that an attacker can't execute arbitrary code through the UDR mechanism. Our plan to solve that is by doing the following: - Require that UDR libraries reside in a directory that is controlled by Trafodion, such that users other than the Trafodion id can't directly add or modify them. - Require a special privilege to create trusted and another (lesser) privilege to create isolated libraries. Creating a library involves taking user code in form of a file in HDFS, a URL, or a local file on the server. Venkat is planning to add a GUI function to provide a client file as the code of a library. - Have one or more special user ids that execute UDRs, don't run the MXUDR process under the Trafodion id as is done today. > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996143#comment-14996143 ] Kevin Xu commented on TRAFODION-1578: - One JVM(process) per Trafodion user by default. User knows how much heap size they need well than us. By the way, it may set limitations on each JVM and only some authorized options can be modified. i'm not saying that it'll have same JVM with DCSServer. DCSServer will create a JVM with a customized IDLE timeout(user may make decision that how long it is existing, maybe never stop). As i said, the new path will be DCSMaster->DCSServer->create new JVM for current user if not exist. Nothing about UDR. Yes, regular users dont have permission to kill DCSServers. They have permission to start/stop SPJ JVM( one for each user by default). > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996115#comment-14996115 ] Kevin Xu commented on TRAFODION-1578: - For authority, the checking should be done in DCSServer( check whether current user has privilege to call) by querying metadata tables. One JVM(process) per user by default. For arbitrary code, JVM SecurityManager should be helpful. 1. FilePermission in SecurityManager should be helpful. 2. It should be solved if started JVM only has specified folder priviledge. Eventually, the JARs should be existing in the running node and it's fine to copy from HDFS at the first time. It's very good to hear that Venkat will add a GUI function. 3. If JVM behaves are under control, it's not necessary to run a process with a special user ids(one user id per connection? if not, it will impact each other. Otherwise, you will have many user ids to be created.) Refs: https://en.wikipedia.org/wiki/Java_security#Security_manager http://docs.oracle.com/javase/7/docs/technotes/guides/security/overview/jsoverview.html > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996145#comment-14996145 ] Hans Zeller commented on TRAFODION-1578: Thanks for the explanation and the link, yesterday I didn't realize that the SecurityManager can also be used to sandbox a piece of Java code. I could imagine that this could work for simple functions, especially for scalar UDFs that usually do simple, localized operations. I don't think it would work for TMUDFs. It might work for SPJs that do only Trafodion SQL and other simple operations. What about Java UDRs that want to access HDFS or HBase or other JDBC drivers? What about UDRs that call other code like Spark, Kafka or MongoDB? > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990060#comment-14990060 ] Suresh Subbiah commented on TRAFODION-1578: --- The JAR perform process is slightly different for CALL statements as described above. Currently what we have is DcsMaster->DcsServer->mxosrvr (these three associations occur when the trafci, JDBC (T4), ODBC app connects) When the first CALL statement is made from this connection an MXUDR process is associated with mxosrvr. mxosrvr acts as master executor for the CALL statement. MXUDR process will host the jvm that will execute the SPJ. If the SPJ contains SQL, then the mxudr process itself acts as a master executor (with JDBC T2). Mxudr processes can be reused between different statements in the same connection. In this case the same JVM will be used. In general JVM management (for those used by SQL queries) is done by the Trafodion engine. Is the proposal here suggesting that users may want to control that more directly? Thank you. > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987928#comment-14987928 ] Venkat Muthuswamy commented on TRAFODION-1578: -- 1. I suggest that we keep the interface generic not just for JARs. It should allow to deploy UDF files too. 2. Storing SPJs in user specific directories can be a problem. You cannot share it with other users. Often times, a SPJ developer or admin will deploy the SPJs and other users/application will call the SPJ. So limiting access to the user who deployed it will not work. The security concerns can be addressed using the SQL privileges on the Library object which encapsulates the jar or udf file. We had a similar design in the pre-apache product and we had to revert to using a single directory or store because it became too cumbersome to manage between roles and users and public. 3. We should keep in mind the ODBC buffer limits and the size of the jar. The client (TrafCI) here will have to send the jar contents in chunks and the SPJ has to assemble them into the single LOB or HDFS file. 4. I have a security concern about the additional commands being proposed: "list all JVMs in user; kill one of them that no long in use; Restart JVMs with latest JARs and so on". How will you enforce security here. Consider leveraging and extending the Trafodion REST server for these kind of manageability functions. These features do not seem to belong in TrafCI. 5. Make sure the proposed interfaces prevent any Denial of Service attacks and adds necessary checks to limit size of the file etc.. > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1578) Proposal for SPJ management
[ https://issues.apache.org/jira/browse/TRAFODION-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987369#comment-14987369 ] liu ming commented on TRAFODION-1578: - I like this idea very much. It allows user to deploy SPJ via a simple trafci command, since itself is a SPJ, so it is also very easy to integrated into a GUI manager. Need to discuss a little more: 1. Put DLL/jar as LOB or as HDFS file? HDFS file seems very good to me. 2. Location of DLL/jar at server side. $MY_SQROOT/export/lib/udr/cache is a good place, and as Kevin proposed, there are benefit to put DLL/jar files in directories per user. So we can have finer control, user A cannot call user B’s SPJ for example. So maybe we can upload the jar to $MY_SQROOT/export/lib/udr/cache/$USER If the SPJ/UDR need extra DLL/jar files, we can simply put into $MY_SQROOT/export/lib along with the SPJ jar, $MY_SQROOT/export/lib is already in CLASSPATH and LD_LIBRARY_PATH. Or we can ask the developer of UDR/SPJ to pack all required DLL/jar into a single file. > Proposal for SPJ management > --- > > Key: TRAFODION-1578 > URL: https://issues.apache.org/jira/browse/TRAFODION-1578 > Project: Apache Trafodion > Issue Type: Improvement > Components: connectivity-dcs >Reporter: Kevin Xu > > JAR upload process: > 1. Initialize JAR upload procedure by default > 2. JAR upload by Trafci(add library LIB_NAME JAR_LOCAL_PATH). Upload and > create library will be done here. And also, you can only upload the JARs by > UPLOAD command on Trafci that it will not create a lib. >Tips: Before put the JAR into HDFS check MD5 first, if the file exists, > only add a record in metadata table in case users upload the same JAR many > times on platform. > 3. On server-side, the JAR will store in HDFS. At the same time JAR > metadata(path in HDFS, MD5 of the file, and others) stores in store procedure > metadata table. > 4. create procedure is the same as now. > JAR perform process: > 1. Send a CALL by Trafci/JDBC/ODBC/ADO.NET. > 2. DCSMaster assign a DCSServer for the CALL. > 3. DCSServer start a JVM for the user. User can modify JVM options, program > properties and JAVA classpath. At the same time, a monitor class will be > starting in the JVM witch will register a node on Zookeeper for this JVM as > well as metadata info( process id, server info and so on) and the node will > be removed while JVM exiting. It allows customer to specify JVM idle time in > case of some realtime senarior like Kafka consumer. > 4. Useful commands on Trafci: list all JVMs in user; kill one of them that no > long in use; Restart JVMs with latest JARs and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)