The problem of course is that the small example doesn't
really convey the complexity of the application.

On Wed, 2004-05-05 at 06:28, Gordon McMillan wrote:
> Why filter the selected view? Now you've got the indices in the selected view, 
> and getting the true indices in the original view takes something like this:
> iv1 = vw.indices(subvw)
> iv2 = iv1.remapwith(rs)
> vw.remove(iv2) 

The reason for the select was because I needed my results today ;-) 
Using just filter it takes around 1.7 seconds to examine one subset
of data.  Performing a select first and then filtering for records
of interest took about 1.1 seconds.  Given that I may need to perform
this operation two to twenty times per selected area the incremental
cost per filter operation on the selected data turned out to be
about .05 second. So in short, 1.1 + 3*0.05 = 1.25 vs. 4*1.7=6.8
for each dataset.  Now there are 20000 to 52000 of these data
sets collected each day so the rough math shows:
52000*1.25/60/60 = 18 hrs vs. 52000*6.8/60/60 = 98 hrs filtering
(and before any one asks, the data sets are financial not physics
related).

Initially tried your above suggestion however mistakenly I
used the wholly filtered view: 
 f1  f2  seq  state
 --  --  ---  -----
  3  3    99  F
 --  --  ---  -----
 Total: 1 rows
Segmentation fault (core dumped)

A little unfriendly for a python environment, I don't have
time to dig in however below is the stack dump if anyone is
curious. A metakit.dump of the iv1 returned:

 index
 -----
    -1
    -1
    -1
 -----
 Total: 3 rows

Note this was after three attempts to run the program.  IMHO it
would have been better to throw an exception instead of returning
data that could cause catastrophic error or data loss, -1 would
mean the last row if interpreted by the python code, this is
probably already on someones plate waiting for cycles to get
to it so please consider this in the spirit of a data point and
not a criticism.

Switching back to the select first, then filtered view, your
suggestion worked.  Thank you very much, I had not considered
applying the filter result to an indices view from the
select.    In keeping the concept of data views separate this
would not have occurred to me.

I will keep this in mind for the future.  Even with it taking
1.25 seconds this was close to intolerable, so using BFI on the
sorted data (reading the whole data set once keeping track of
the state of everything) it's down to 0.65 seconds which is
still higher than I would like, I may have to investigate other
alternatives.

gdb /usr/bin/python2.2 core.15440
GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging
symbols found)...
 
warning: core file may not match specified executable file.
Core was generated by `python wkv.py'.
Program terminated with signal 11, Segmentation fault.
<snip>
Loaded symbols for /usr/lib/python2.2/lib-dynload/strop.so
#0  c4_ColOfInts::Get_8i(int) (this=0x8178da4, index_=3) at
column.inl:55
55      column.inl: No such file or directory.
        in column.inl
(gdb)
Current language:  auto; currently c++
(gdb) info stack
#0  c4_ColOfInts::Get_8i(int) (this=0x8178da4, index_=3) at
column.inl:55
#1  0x4009c983 in c4_ColOfInts::Get(int, int&) (this=0x1,
index_=135766136, [EMAIL PROTECTED]) at ../src/column.cpp:1264
#2  0x400a6490 in c4_FormatX::Get(int, int&) (this=0x8178d98,
index_=135766136, [EMAIL PROTECTED]) at ../src/format.cpp:127
#3  0x400aa093 in c4_Handler::GetBytes(int, c4_Bytes&, bool) (this=0x1,
index_=135766136, [EMAIL PROTECTED], copySmall_=false)
    at ../src/handler.cpp:58
#4  0x400bb15b in c4_Sequence::Get(int, int, c4_Bytes&) (this=0x817a020,
index_=1, propId_=135766136, [EMAIL PROTECTED])
    at ../src/viewx.cpp:356
#5  0x400b78a3 in c4_View::GetItem(int, int, c4_Bytes&) const
(this=0x817a020, row_=1, col_=135766136, [EMAIL PROTECTED])
    at mk4.inl:28
#6  0x4009e86e in c4_RemapWithViewer::GetItem(int, int, c4_Bytes&)
(this=0x817a020, row_=135761316, col_=135766136,
    [EMAIL PROTECTED]) at ../src/custom.cpp:425
#7  0x4009da9c in c4_CustomSeq::DoGet(int, int, c4_Bytes&) const
(this=0x1, row_=135766136, col_=135766136, [EMAIL PROTECTED])
    at ../src/custom.cpp:170
#8  0x4009d557 in c4_CustomHandler::Get(int, int&) (this=0x81797d8,
index_=135766136, [EMAIL PROTECTED]) at ../src/custom.cpp:65
#9  0x400aa093 in c4_Handler::GetBytes(int, c4_Bytes&, bool) (this=0x1,
index_=135766136, [EMAIL PROTECTED], copySmall_=true)
    at ../src/handler.cpp:58
#10 0x400a3707 in c4_SortSeq::LessThan(long, long) (this=0x8189670, a=1,
b=0) at ../src/derived.cpp:509
#11 0x400a38f9 in c4_SortSeq::MergeSortThis(long*, int, long*)
(this=0x8189670, ar=0x81390d8, size=1, scratch=0x817b168)
    at ../src/derived.cpp:558
#12 0x400a3b1e in c4_SortSeq::MergeSort(long*, int) (this=0x817a078,
ar=0x81390d8, size=2) at ../src/derived.cpp:641
#13 0x400a3f25 in c4_SortSeq (this=0x81390e0, [EMAIL PROTECTED],
down_=0x0) at univ.inl:105
#14 0x400a4c4e in f4_CreateSort(c4_Sequence&, c4_Sequence*)
([EMAIL PROTECTED], down_=0x817a078) at ../src/derived.cpp:994
#15 0x400b806f in c4_View::Sort() const (this=0x817a078) at
../src/view.cpp:448
#16 0x40098589 in PyView::remove(PyView const&) (this=0x8177e08,
[EMAIL PROTECTED]) at ../python/PyView.cpp:1319
#17 0x40095958 in PyView_remove (o=0x817a078, _args=0x817a078) at
../python/PyView.cpp:760
#18 0x080d1a79 in PyCFunction_Call ()
#19 0x0807b1c0 in PyEval_EvalCode ()
#20 0x0807bd7e in PyEval_EvalCodeEx ()


> 
> Why not
> dead = vw.filter(lambda row: row.f1==pickon and row.state == 'F')
> vw.remove(dead)

See above.

> 
> > subsubvw = subvw.remapwith(rs)
> > metakit.dump(subsubvw)
> > #
> > ## Now I want remove the filter rows, how can I back track?
> > #
> > ## If I use
> > ##vw.remove(rs)
> > ## It only works if order is perfect
> > #
> > ## The delete and remove methods are not available on the derived views
> > ##subvw.remove(rs)
> > ##subsubvw.remove(rs)
> > #
> > vw.remove(vw.indices(subvw))
> > # Deletes all 'pickon' records, not just the ones in state 'F'

_____________________________________________
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Reply via email to