Have you tried stored data as a vector and the linear index of the item
into another vector?
Final step would be fill in data into a sparse vector and then reshape it
into rank 2.

However 37% filled is not that sparse , IMO sparse meant at most a few
percent filled. Use of sparse array for 37% is overkill. What you needed is
more physical RAM.


On Tue, 11 Oct 2022 at 1:29 AM David Lambert <b49p23t...@gmail.com> wrote:

> This was my original code which ran  `forever'.  Were the amendments
> truly in place?  One array is sparse, the other pre-allocated.  Comments
> show observations on task manager.  I don't have much experience with
> Windows beyond the usual Office programs.  (PS. I now realize I could
> have discarded the "T" and stored the remaining number in the sparse
> array.)
>
> NB. According to task manager, 7GByte free (of 16 GB)
> NB. and j process steadily fluctuated by 10 Mbytes.
>
> NB. is this a mapped file issue on Windows 10
> NB. or were the amendments not in place?
>
> NB. file detail
> NB. fields   into    rows * columns
> NB. 67653078 *inv 1183748 *    2141
> NB. 37.4618 percent filled
>
> NB. c:/Users/user/Downloads/j904_win64/j904/bin/jconsole.exe
> NB. JVERSION
> NB. Engine: j904/j64avx/windows
> NB. Beta-e: commercial/2022-07-16T19:25:02
> NB. Library: 9.04.03
> NB. Platform: Win 64
> NB. Installer: J904 install
> NB. InstallPath: c:/users/user/downloads/j904_win64/j904
> NB. Contact: www.jsoftware.com
>
> require 'jmf'
>
> testfile=:'c:/Users/user/temp/tc.csv'
> datafile=:'c:/Users/user/ZW/
> kaggle.com/bosch-production-line-performance/train_categorical.csv'
>
> NB. INF {~ 0 indexes rows
> NB. gets data of first row
> indexes=: (>:@{. + [: i.@<: -~/)@({ ~ 0 1&+)~
>
> tokenize=: 3 :0  NB. y is the literal
>   rows=. _1 , I. LF = y
>   row_tally=. <: # rows
>   row=. col=. 0
>   k=. _1  NB. current data index
>   col_tally=. >: +/ ',' = y {~ 0 indexes rows  NB. tally of columns
>   data=: a: $~ col_tally + +/ 'T' = y  NB. columns + those with data,
> skipping ID
>   NB. coor shall be sparse
>   NB. coor=. ((<: # rows) , col_tally) $ _1
>   coor=: 1 $. ((<: # rows) , col_tally) ; 0 1 ; _1 NB. coordinates of data
>   while. row < 9 >. row_tally do.
>    fields=. ([: <;._2 ,&',') y {~ row indexes rows
>    cols=. }. I. a: ~: fields  NB. indexes of data in row excluding ID
>    po=. (>: k) + i. # cols    NB. positions of these items in data
>    co=. < row ; cols          NB. location in sparse array to store po
>    da=. cols { fields
>
>
>    NB. coor is sparse
>    coor=: po co} coor   NB.NB.NB. assignments in place?
>    NB. data is preallocated
>    data=: da po} data   NB.NB.NB. assignments in place?
>
>
>    k=. k + # cols
>    row=. >: row
>   end.
>   'data and coor are global'
> )
>
>
> JCHAR map_jmf_'INF';testfile ] datafile
>
> tokenize INF
>
> unmap_jmf_'INF'
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to