Our Loader class is pretty straightforward too. Here it is: class CadastralLoader(registro:org.apache.spark.sql.Row) extends org.apache.fluo.api.client.Loader { override def load(tx:org.apache.fluo.api.client.TransactionBase, context:org.apache.fluo.api.client.Loader.Context):Unit = { val rowID = registro.getAs("NUM_CPF").toString
*// HERE WE ITERATE 61 COLUMNS* registro.schema.fieldNames.foreach{fn => if (fn != "NUM_CPF") { val column = new org.apache.fluo.api.data.Column("cadastral_col_fam",fn) val value = registro.getAs(fn).toString tx.set(rowID, column, value) } } } } Alan Camillo *BlueShift *I IT Director Cel.: +55 11 98283-6358 Tel.: +55 11 4605-5082 2018-03-13 15:53 GMT-03:00 Mike Walch <mwa...@apache.org>: > If you are running without workers, the problem is probably in your > Spark/Loader process as the Oracle process is pretty simple and > lightweight. If your Loader process isn't stuck (from checking it with > jstack), this could be due to collisions. Fluo metrics will report the > number of collisions. > > On Tue, Mar 13, 2018 at 2:36 PM, Alan Camillo <a...@blueshift.com.br> > wrote: > > > Another important information: > > > > - We tested with *no observers*, consequently *no workers* > > - We just need to know if the Loader/Oracle will hold this quantity of > > transactions and in how long it will take. > > > > > > Alan Camillo > > *BlueShift *I IT Director > > Cel.: +55 11 98283-6358 > > Tel.: +55 11 4605-5082 > > > > 2018-03-13 15:29 GMT-03:00 Alan Camillo <a...@blueshift.com.br>: > > > > > Great Mike! > > > Thank you both for suggestions. I'll try to implement the ideas. > > > > > > A little bit more about the scenario: > > > > > > - We are using the version 1.2 of Fluo > > > - Spark is in version 1.6 (unfortunately) with JDK 1.8 > > > - and Accumulo in verions 1.7. > > > > > > When we try less messages everything goes well. > > > Any result I'll let you know. > > > > > > Alan Camillo > > > *BlueShift *I IT Director > > > Cel.: +55 11 98283-6358 <+55%2011%2098283-6358> > > > Tel.: +55 11 4605-5082 <+55%2011%204605-5082> > > > > > > 2018-03-13 15:04 GMT-03:00 Mike Walch <mwa...@apache.org>: > > > > > >> I opened a PR to add some troubleshooting docs to the website. > > >> > > >> https://github.com/apache/fluo-website/pull/142 > > >> > > >> On Tue, Mar 13, 2018 at 10:59 AM, Keith Turner <ke...@deenlo.com> > > wrote: > > >> > > >> > On Tue, Mar 13, 2018 at 7:11 AM, Alan Camillo < > a...@blueshift.com.br> > > >> > wrote: > > >> > > Hey fellas! > > >> > > Sorry to demand so much from you. But we are really trying to put > > Fluo > > >> > to work here and we are facing some issues. > > >> > > > > >> > > Recently we decided to use Apache Spark to star the process to > > ingest > > >> > 300 millions of lines with 62 columns each. > > >> > > > > >> > > We study this: > > >> > > https://fluo.apache.org/blog/2016/12/22/spark-load/ carefully and > > >> > decided to implement the first strategy described. Executing load > > >> > transactions in Spark > > >> > > > > >> > > In that way we could reuse the code we build for the application > > >> > transactions. But... > > >> > > But we are not going well. Fluo stop to insert after a while and > we > > >> are > > >> > not able to know why. > > >> > > We tried to adjust the loader queue and size to see what happens > but > > >> > nothing really helps. > > >> > > I need a help to debug Fluo and understanding what’s going on. Can > > >> > someone point me a direction? > > >> > > > >> > Can you jstack the spark process a few times and see if Fluo code is > > >> > stuck anywhere? > > >> > > > >> > > > > >> > > Thanks! > > >> > > Alan Camillo > > >> > > > >> > > > > > > > > >