Re: Spark + Fluo

Alan Camillo Tue, 13 Mar 2018 12:11:42 -0700

Our Loader class is pretty straightforward too. Here it is:

class CadastralLoader(registro:org.apache.spark.sql.Row) extends
org.apache.fluo.api.client.Loader {
  override def load(tx:org.apache.fluo.api.client.TransactionBase,
context:org.apache.fluo.api.client.Loader.Context):Unit = {
    val rowID = registro.getAs("NUM_CPF").toString


    *// HERE WE ITERATE 61 COLUMNS*
    registro.schema.fieldNames.foreach{fn =>
      if (fn != "NUM_CPF") {
       val column = new
org.apache.fluo.api.data.Column("cadastral_col_fam",fn)
       val value = registro.getAs(fn).toString
       tx.set(rowID, column, value)
      }
    }
  }
}



Alan Camillo
*BlueShift *I IT Director
Cel.: +55 11 98283-6358
Tel.: +55 11 4605-5082

2018-03-13 15:53 GMT-03:00 Mike Walch <mwa...@apache.org>:

> If you are running without workers, the problem is probably in your
> Spark/Loader process as the Oracle process is pretty simple and
> lightweight. If your Loader process isn't stuck (from checking it with
> jstack), this could be due to collisions. Fluo metrics will report the
> number of collisions.
>
> On Tue, Mar 13, 2018 at 2:36 PM, Alan Camillo <a...@blueshift.com.br>
> wrote:
>
> > Another important information:
> >
> >    - We tested with *no observers*, consequently *no workers*
> >    - We just need to know if the Loader/Oracle will hold this quantity of
> >    transactions and in how long it will take.
> >
> >
> > Alan Camillo
> > *BlueShift *I IT Director
> > Cel.: +55 11 98283-6358
> > Tel.: +55 11 4605-5082
> >
> > 2018-03-13 15:29 GMT-03:00 Alan Camillo <a...@blueshift.com.br>:
> >
> > > Great Mike!
> > > Thank you both for suggestions. I'll try to implement the ideas.
> > >
> > > A little bit more about the scenario:
> > >
> > >    - We are using the version 1.2 of Fluo
> > >    - Spark is in version 1.6 (unfortunately) with JDK 1.8
> > >    - and Accumulo in verions 1.7.
> > >
> > > When we try less messages everything goes well.
> > > Any result I'll let you know.
> > >
> > > Alan Camillo
> > > *BlueShift *I IT Director
> > > Cel.: +55 11 98283-6358 <+55%2011%2098283-6358>
> > > Tel.: +55 11 4605-5082 <+55%2011%204605-5082>
> > >
> > > 2018-03-13 15:04 GMT-03:00 Mike Walch <mwa...@apache.org>:
> > >
> > >> I opened a PR to add some troubleshooting docs to the website.
> > >>
> > >> https://github.com/apache/fluo-website/pull/142
> > >>
> > >> On Tue, Mar 13, 2018 at 10:59 AM, Keith Turner <ke...@deenlo.com>
> > wrote:
> > >>
> > >> > On Tue, Mar 13, 2018 at 7:11 AM, Alan Camillo <
> a...@blueshift.com.br>
> > >> > wrote:
> > >> > > Hey fellas!
> > >> > > Sorry to demand so much from you. But we are really trying to put
> > Fluo
> > >> > to work here and we are facing some issues.
> > >> > >
> > >> > > Recently we decided to use Apache Spark to star the process to
> > ingest
> > >> > 300 millions of lines with 62 columns each.
> > >> > >
> > >> > > We study this:
> > >> > > https://fluo.apache.org/blog/2016/12/22/spark-load/ carefully and
> > >> > decided to implement the first strategy described. Executing load
> > >> > transactions in Spark
> > >> > >
> > >> > > In that way we could reuse the code we build for the application
> > >> > transactions. But...
> > >> > > But we are not going well. Fluo stop to insert after a while and
> we
> > >> are
> > >> > not able to know why.
> > >> > > We tried to adjust the loader queue and size to see what happens
> but
> > >> > nothing really helps.
> > >> > > I need a help to debug Fluo and understanding what’s going on. Can
> > >> > someone point me a direction?
> > >> >
> > >> > Can you jstack the spark process a few times and see if Fluo code is
> > >> > stuck anywhere?
> > >> >
> > >> > >
> > >> > > Thanks!
> > >> > > Alan Camillo
> > >> >
> > >>
> > >
> > >
> >
>

Re: Spark + Fluo

Reply via email to